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Abstract — We investigate spatially coupled code ensembles. For 
transmission over the binary erasure channel, it was recently 
shown that spatial coupling increases the belief propagation 
threshold of the ensemble to essentially the maximum a-priori 
threshold of the underlying component ensemble. This explains 
why convolutional LDPC ensembles, originally introduced by 
Felstrom and Zigangirov, perform so well over this channel. 

We show that the equivalent result holds true for transmission 
over general binary-input memoryless output-symmetric chan- 
nels. More precisely, given a desired error probability and a gap 
to capacity, we can construct a spatially coupled ensemble which 
fulfills these constraints universally on this class of channels under 
belief propagation decoding. In fact, most codes in that ensemble 
have that property. The quantifier universal refers to the single 
ensemble/code which is good for all channels but we assume that 
the channel is known at the receiver. 

The key technical result is a proof that under belief propaga- 
tion decoding spatially coupled ensembles achieve essentially the 
area threshold of the underlying uncoupled ensemble. 

We conclude by discussing some interesting open problems. 



I. Introduction 

A. Historical Perspective 

Ever since the publication of Shannon's seminal paper fTJ 
and the introduction of the first coding schemes by Ham- 
ming |2| and Golay |3), coding theory has been concerned 
with finding low-delay and low-complexity capacity-achieving 
schemes. The interested reader can find an excellent historical 
review in (4). Let us just briefly mention some of the highlights 
before focusing on those parts that are the most relevant for 
our purpose. 

In the first 50 years, coding theory focused on the con- 
struction of algebraic coding schemes and algorithms that 
were capable of exploiting the algebraic structure. Two early 
highlights of this line of research were the introduction of 
Bose-Chaudhuri-Hocquenghem (BCH) codes 0, 01 as well 
as Reed-Solomon (RS) codes Q. Berlekamp devised an 
efficient decoding algorithm JH] and this algorithm was then 
interpreted by Massey as an algorithm for finding the shortest 
feedback-shift register that generates a given sequence |9). 
More recently, Sudan introduced a list decoding algorithm 
for RS codes that decodes beyond the guaranteed error- 
correcting radius flOl . Guruswami and Sudan improved upon 
this algorithm ifTTI and Koetter and Vardy showed how to 
handle soft information lfl2l . 



Another important branch started with the introduction of 
convolutional codes (13l by Elias and the innoduction of the 
sequential decoding algorithm by Wozencraft lfT4l . Viterbi 
introduced the Viterbi algorithm ifTBI . It was shown to be 
optimal by Forney fl6l and Omura IfTTI and to be eminently 
practical by Heller lfl8l, fl9l. 

An important development in transmission over the continu- 
ous input, band-limited, additive white Gaussian noise channel 
was the invention of the lattice codes. It was shown in |20l - 
l24l that lattice codes achieve the Shannon capacity. A break- 
through in bandwidth-limited communications came about 
when Ungerboeck 8231 - 1271 invented a technique to combine 
coding and modulation. Ungerboeck's technique ushered in a 
new era of fast modems. The technique, called trellis-coded 
modulation (TCM), offered significant coding gains without 
compromising bandwidth efficiency by mapping binary code 
symbols, generated by a convolutional encoder, to a larger 
(non-binary) signal constellation. In ||28l . ||29l Forney showed 
that lattice codes as well as TCM schemes may be generated 
by the same basic elements and the generalized technique was 
termed coset-coding. 

Coming back to binary linear codes, in 1993, Berrou, 
Glavieux and Thitimajshima lf30Tl proposed turbo codes. These 
codes attain near-Shannon limit performance under low- 
complexity iterative decoding. Their remarkable performance 
lead to a flurry of research on the "turbo" principle. Around 
the same time, Spielman in his thesis Ijfl . Il32ll and MacKay 
and Neal in 11331 - 11361 . independently rediscovered low-density 
parity-check (LDPC) codes and iterative decoding, both intro- 
duced in Gallager's remarkable thesis 1371 . Wiberg showed 
1381 that both turbo codes and LDPC codes fall under the 
umbrella of codes based on sparse graphs and that their 
iterative decoding algorithms are special cases of the sum- 
product algorithm. This line of research was formalized by 
Kschischang, Frey, and Loeliger who introduced the notion of 
factor graphs 0911 . 

The next breakthrough in the design of codes (based on 
sparse graphs) came with the idea of using irregular LDPC 
codes by Luby, Mitzenmacher, Shokrollahi and Spielman 
1401 , BP . With this added ingredient it became possible to 
construct irregular LDPC codes that achieved performance 
within 0.0045dB of the Shannon limit when transmitting 
over the binary-input additive white Gaussian noise chan- 
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nel, see Chung, Forney, Richardson and Urbanke H421 , The 
development of these codes went hand in hand with the 
development of a systematic framework for their analysis by 
Luby, Mitzenmacher, Shokrollahi and Spielman l43l . J44| and 
Richardson and Urbanke B31 . 

A central research topic for codes on graphs is the interac- 
tion of the graphical structure of a code and its performance. 
Turbo codes themselves are a prime example how the "right" 
structure is important to achieve good performance l30l . 
Further important parameters and structures are, the degree 
distribution (dd) and in particular the fraction of degree-two 
variable nodes, multi-edge ensembles j46l . degree-two nodes 
in a chain 07), and protographs EH, B9l . 

Currently sparse graph codes and their associated iterative 
decoding algorithms are the best "practical" codes in terms of 
their trade-off between performance and complexity and they 
are part of essentially all new communication standards. 

Polar codes represent the most recent development in cod- 
ing theory ll50l . They are provably capacity achieving on 
binary-input memoryless output-symmetric (BMS) channels 
(and many others) and they have low decoding complexity. 
They also have no error floor due to a minimum distance 
which increases like the square root of the blocklength. The 
simplicity, elegance, and wide applicability of polar codes have 
made them a popular choice in the recent literature. There 
are perhaps only two areas in which polar codes could be 
further improved. First, for polar codes the convergence of 
their performance to the asymptotic limit is slow. Currently no 
rigorous statements regarding this convergence for the general 
case are known. But "calculations" suggest that, for a fixed 
desired error probability, the required blocklength scales like 
1/<5 M , where S is the additive gap to capacity and where p 
depends on the channel and has a value around 4, BP . l52l . 
Note that random block codes under MAP decoding have 
a similar scaling behavior but with p = 2. This implies a 
considerably faster convergence to the asymptotic behavior. 
The value 2 is a lower bound for fi for any system since the 
variations of the channel itself imply that \i > 2. The second 
aspect is universality: the code design of polar codes depends 
on the specific channel being used and one and the same design 
cannot simultaneously achieve capacity over a non-trivial class 
of channels (under successive cancellation decoding). 

Let us now connect the content of this paper to the previous 
discussion. Our main aim is to explain the role of a further 
structural element in the realm of sparse graph codes (besides 
the previously discussed such examples), namely that of 
"spatial coupling." We will show that this coupling of graphs 
leads to a remarkable change in their performance. Ensembles 
designed in this way combine some of the nice elements of 
polar codes (namely the fact that they are provably capacity 
achieving under low complexity decoding) with the practical 
advantages of sparse graph codes (the codes are competitive 
already for moderate lengths). Perhaps most importantly, it 
is possible to construct universal such codes for the whole 
class of BMS channels. Here, universality refers to the fact 
that one and the same ensemble is good for a whole class of 
channels, assuming that at the receiver we have knowledge of 
the channel. 



B. Prior Work on Spatially Coupled Codes 

The potential of spatially coupled codes has long been rec- 
ognized. Our contribution lies therefore not in the introduction 
of a new coding scheme, but in clarifying the mechanism that 
make these codes perform so well. 

The term spatially coupled codes was coined in l53l . 
Convolutional LDPC codes (more precisely, terminated convo- 
lutional LDPC codes), which were introduced by Felstrtim and 
Zigangirov in ll54l . and their many variants belong to this class. 
Why do we introduce a new term? The three perhaps most 
important reasons are: (i) the term "convolutional" conjures 
up a fairly specific node interconnection structure whereas 
experiments have shown that the particular nature of the 
connection is not important and that the threshold saturation 
effect occurs as soon as the connection is sufficiently strong; 
(ii) a well known result for convolutional codes says that the 
boundary conditions are "forgotten" exponentially fast; but for 
spatially coupled codes it is exactly the boundary condition 
which causes the effect and there is no decay of this effect 
in the spatial dimension of the code; (iii) the same effect 
has (empirically) been shown to hold in many other graphical 
models, most of them outside the realm of coding; the term 
"spatial coupling" is perhaps then somewhat more generally 
applicable. 

There is a considerable literature on convolutional-like 
LDPC ensembles. Variations on the constructions as well 
as some analysis can be found in Engdahl and Zigangirov 
ll55l . Engdahl, Lentmaier, and Zigangirov l56l . Lentmaier, 
Truhachev, and Zigangirov J57), as well as Tanner, D. Srid- 
hara, A. Sridharan, Fuja, and Costello |[58l . 

In ll59l . ll60l . Sridharan, Lentmaier, Costello and Zigangirov 
consider density evolution (DE) analysis for convolutional 
LDPC ensembles and determine thresholds for the BEC. 
The equivalent results for general channels were reported by 
Lentmaier, Sridharan, Zigangirov and Costello in l60l . l61~l . 
This DE analysis is in many ways the starting point for our 
investigation. By comparing the thresholds to the thresholds of 
the underlying ensembles under MAP decoding (see e.g. l62l ). 
it quickly becomes apparent that an interesting effect must be 
at work. Indeed, in a recent paper 1631 . Lentmaier and Fettweis 
followed this route and independently formulated the equality 
of the belief propagation (BP) threshold of convolutional 
LDPC ensembles and the MAP threshold of the underlying 
ensemble as a conjecture. 

A representation of convolutional LDPC ensembles in terms 
of a protograph was introduced by Mitchell, Pusane, Zigan- 
girov and Costello 11641 . The corresponding representation for 
terminated convolutional LDPC ensembles was introduced by 
Lentmaier, Fettweis, Zigangirov and Costello ll65l . A variety 
of constructions of LDPC convolutional codes from the graph- 
cover perspective is shown by Pusane, Smarandache, Vontobel, 
and Costello ll66l . 

A pseudo-codeword analysis of convolutional LDPC codes 
was performed by Smarandache, Pusane, Vontobel, and 
Costello in i66l - l68l . Such an analysis is important if we 
want to understand the error-floor behavior of spatially coupled 
ensembles. 



3 



In (69), Papaleo, Iyengar, Siegel, Wolf, and Corazza study 
the performance of windowed decoding of convolutional 
LDPC codes on the BEC. Such a decoder has a decoding 
complexity which is independent of the chain length, an 
important practical advantage. Luckily, it turns out that the 
performance under windowed decoding, when measured in 
terms of the threshold, approaches the "regular"" (without 
windowed decoding) threshold exponentially fast in the win- 
dow size, see l70l . iTTTl . The threshold saturation phenomenon 
therefore does not require an infinite window size. 

The scaling behavior of spatially coupled ensembles, i.e., 
the relationship between the chain length, the number of 
variables per section, and the error probability is discussed 
by Olmos and Urbanke in ll72ll . 

C. Prior Results for the Binary Erasure Channel 

It was recently shown in |f53] that for transmission over the 
BEC spatially coupled ensembles have a BP threshold which 
is essentially equal to the MAP threshold of the underlying 
uncoupled ensemble. Further, this threshold is also essentially 
equal to the MAP threshold of the coupled ensemble. This 
phenomena was called threshold saturation in 1531 since the 
BP threshold takes on its largest possible value (the MAP 
threshold). This significant improvement in the performance 
is due to the spatial coupling of the underlying code. Those 
"sections" of the code that have already succeeded in decoding 
can help their neighboring less fortunate sections in the decod- 
ing process. In this manner, the information propagates from 
the "boundaries", where the bits are known perfectly towards 
the "middle". In a recent paper 1631 . Lentmaier and Fettweis 
independently formulated the same statement as a conjecture 
and provided numerical evidence for its validity. They attribute 
the observation of the equality of the two thresholds to G. Liva. 

It was shown in (64), E3, |68), d that if we couple 
component codes whose Hamming distance grows linearly in 
the blocklength then also the resulting coupled ensembles have 
this property (assuming that the number of "sections" or copies 
of the underlying code is kept fixed). The equivalent result is 
true for stopping sets. This implies that for the transmission 
over the BEC the block BP threshold is equal to the bit BP 
threshold and that such ensembles do not exhibit error floors 
under BP decoding. 

D. Prior Results for General Binary-Input Memoryless 
Output-Symmetric Channels 

As pointed out in a preceding section, BP thresholds for 
transmission over general BMS channels were computed by 
means of a numerical procedure by Lentmaier, Sridharan, 
Zigangirov and Costello in ||6TI . Further, in [74 1 (conjectured) 
MAP thresholds for some LDPC ensembles were computed 
according to the Maxwell construction. Comparing these two 
values, one can check empirically that also for transmission 
over general BMS channels the BP threshold of the coupled 
ensembles is essentially equal to the (conjectured) MAP 
threshold of the underlying ensemble. Indeed, recently both 
l75l as well as l76l provided further numerical evidence that 



the threshold saturation phenomenon also applies to general 
BMS channels. 

For typical sparse graph ensembles the MAP threshold is not 
equal to the Shannon threshold but the Shannon threshold can 
only be reached by taking a sequence of such ensembles (e.g., 
a sequence of increasing degrees). There are some notable 
exceptions, like MN ensembles or HA ensembles. Kasai and 
Sakaniwa take this as a starting point to investigate in 117711 
whether by spatially coupling such ensembles it is possible 
to create ensembles which are universally capacity achieving 
under BP decoding. 

E. Spatial Coupling for General Communication Scenarios, 
Signal Processing, Computer Science, and Statistical Physics 

The principle which underlies the good performance of 
spatially coupled ensembles is broad. It has been shown to 
apply to a variety of problems in communications, computer 
science, signal processing, and physics. To mention some 
concrete examples, the threshold saturation effect (dynami- 
cal/algorithmic threshold of the system being equal to the 
static or condensation threshold) of coupled graphical models 
has been observed for rate-less codes by Aref and Urbanke 
f78l . for channels with memory and multiple access channels 
with erasure by Kudekar and Kasai (79), (80), for CDMA 
channels by Takeuchi, Tanaka, and Kawabata 1811 . for relay 
channels with erasure by Uchikawa, Kasai, and Sakaniwa 
|82| . for the noisy Slepian-Wolf problem by Yedla, Pfister, 
and Narayanan l83l . and for the BEC wiretap channel by 
Rathi, Urbanke, Andersson, and Skoglund l84l . Uchikawa, 
Kurkoski, Kasai, and Sakaniwa recently showed an improve- 
ment of the BP threshold has also for transmission over 
the unconstrained AWGN channel using low-density lattice 
codes [85]. Further, Yedla, Nguyen, Pfister and Narayanan, 
demonstrated the universality of spatially-coupled codes in 
the 2-user binary input Gaussian multiple-access channel and 
finite state ISI channels like the dicode-erasure channel and 
the dicode channel with AWGN (86), (87). In ED they 
show in addition that for a fixed rate pair, spatially-coupled 
ensembles universally saturate the achievable region (i.e., the 
set of channel gain parameters that are achievable for the fixed 
rate pair) under BP decoding. Similarly, in f87l they provide 
numerical evidence that spatially coupled ensembles achieve 
the symmetric information rate for the dicode erasure channel 
and the dicode channel with AWGN. 

In signal processing and computer science spatial coupling 
has found success in the field of compressed sensing l88l - 
[9T). In ll88l . Kudekar and Pfister use sparse measurement 
matrices with sub-optimal verification decoding and show that 
spatial coupling boosts thresholds of sparse recovery. In |90l , 
l9l"l , Krzakala, Mezard, Sausset, Sun, and Zdeborova as well 
as Donoho, Javanmard, and Montanari show that by carefully 
designing dense measurement matrices using spatial coupling 
one can achieve the best possible recovery threshold, i.e., the 
one achieved by the optimal £q decoder. Thus, the phenomena 
of threshold saturation is also demonstrated in this case. This 
development is quite remarkable. 

Statistical physics is another very natural area in which 
the threshold saturation phenomenon is of interest. For the 
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so-called random A'-SAT problem, random graph coloring, 
and the Curie-Weiss model, spatially coupled ensembles were 
investigated by Hassani, Macris, and Urbanke, l92l - ll94| . In 
all these cases, the threshold saturation phenomenon was 
observed. This suggests that it might be possible to study 
difficult theoretical problems in this area, like the existence 
of the static threshold, by studying the dynamical threshold 
of a chain of coupled models, perhaps an easier problem. 
Further spatially-coupled models were considered by Takeuchi 
and Tanaka 



F. Main Results and Consequences 

In this paper we show that for transmission over general 
BMS channels coupled ensembles exhibit the threshold sat- 
uration phenomenon. By choosing e.g. regular component 
ensembles of fixed rate and increasing degree, this implies 
that coupled ensembles can achieve capacity over this class 
of channels. More precisely, for each 6 > there exists a 
coupled ensemble which achieves at least a fraction 1 — 5 of 
capacity universally, under belief propagation decoding, over 
the whole class of BMS channels. The qualifier "universal" is 
important here. 

Coupled ensembles inherit to a large degree the error floor 
behavior of the underlying ensemble. Further, such an ensem- 
ble can be chosen so that it has a non-zero error correcting 
radius, and hence does not exhibit error floors. To achieve 
this, it suffices to take the variable-node degree to be at least 
five. This guarantees that a randomly chosen graph from such 
an ensemble is an expander with expansion exceeding three- 
quarters with high probability. This expansion guarantees an 
error correcting radius under the so-called flipping decoder 
as well as under the BP decoder, assuming that we 
suitably clip both the received as well as the internal messages 
EO- 

Although one can empirically observe the threshold sat- 
uration phenomenon for a wide array of component codes, 
we state and prove the main result only for regular LDPC 
ensembles. This keeps the exposition manageable. 

G. Outline 

In Section [XT] we briefly review regular LDPC ensembles 
and their asymptotic (in the blocklength) analysis. Much of 
this material is standard and we only include it here to set 
the notation and to make the paper largely self-contained. The 
two most important exceptions are our in-depth discussion of 
the Wasserstein distance and the the so-called area threshold, 
in particular the (Negativity) Lemma [27] 

In Section [Til] we review some basic properties of coupled 
ensembles. Using simple extremes of information combining 
techniques, we will see in Section UlI-GI that coupling indeed 
increases the BP threshold significantly, even though these 
simple arguments are not sufficient to characterize the BP 
threshold under coupling exactly. 

We state our main result, namely that the BP threshold of 
coupled ensembles is essentially equal to the area threshold 
of the underlying component ensemble, in Section [TV] We 
also discuss how one can easily strengthen this result to apply 



to individual codes rather than ensembles and how this gives 
rise to codes which are universally close to capacity under BP 
decoding for the whole class of BMS channels. 

We end in Section lTV-El with a discussion of what challenges 
still lie ahead. In particular, spatial coupling has been shown 
empirically to lead to the threshold saturation phenomenon in 
a wide class of graphical models. Rather than proving each 
such scenario in isolation, we want a common framework to 
analyze all such systems. 

Many of the proofs are relegated to the appendices. This 
makes it possible to read the material on two levels - a casual 
level, skipping all the proofs and following only the flow of the 
argument, and a more detailed level, consulting the material 
in the appendices. 

II. Uncoupled Systems 

A. Regular Ensembles 

Definition 1 ({di,d r )-Regular Ensemble): Fix 3 < di < 
d r , di,d r £ N, and n so that ndi/d r € N. The (di, <i r )-regular 
LDPC ensemble of blocklength n is defined as follows. There 
are n variable nodes and n^f check nodes. Each variable node 
has degree di and each check node has degree d r . Accordingly, 
each variable node has di sockets, i.e., di places to connect 
an edge to, and each check node has d r sockets. Therefore, 
there are in total dpi variable-node sockets and the same 
number of check-node sockets. Number both kinds from 1 
to ndi. Consider the set of permutations II on {1, . . . , ndi}. 
Endow this set with a uniform probability distribution. To 
sample from the (di, d r ) -regular ensemble, sample from LI 
and connect the variable to the check node sockets according 
to the chosen permutation. This is the configuration model of 
LDPC ensembles. It is inspired by the configuration model of 
random graphs |98 > Section 2.4]. ■ 

B. Binary-Input Memoryless Output-Symmetric Channels 

Throughout we will assume that transmission is taking place 
over a BMS channel. Let X denote the input and let Y 
be the output. Further, let p(Y — y\X = x) denote the 
transition probability describing the channel. An alternative 
characterization of the channel is by means of its so-called L- 
distribution, denote it by c. More precisely, c is the distribution 
of 



In 



p(Y\X = l) 



p(Y\X = -l) 

conditioned that X = 1. 

Given c, we write c, |c|, and |£| to denote the corresponding 
D distribution, the \D\ distribution and the cdf in the \D\- 
domain, respectively, see ll62l Section 4.1.4]. 

Typically we do not consider a single channel in isolation 
but a whole family of channels. We write {BMS(er)} to denote 
the family parameterized by the scalar a. Often it will be more 
convenient to denote this family by {c CT }, i.e., to use the family 
of L-densities which characterize the channel family. If it is 
important to make the range of the parameter a explicit, we 
will write {c CT }^. 
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Sometimes it is convenient to use the natural parameter 
of the family. For example, for the three fundamental chan- 
nels, the BEC, the binary symmetric channel (BSC) and the 
binary additive white-Gaussian noise channel (BAWGNC), 
the corresponding channel families are given by {BEC(e)}J, 

{BSC(p)}|, and { BAWGNC (cr)}g°. Other times, it is more 
convenient to use a common parameterization. E.g., we will 
write {BMS(h)} to denote a channel family where BMS(h) 
denotes the element in the family of entropy h. 

Assume that we are given a channel family {BMS(ct)}^. 
We say that the family is complete if H(BMS(<r)) = 0, 
H(BMS(7)) = 1, and for each h € [0,1] there exists a 
parameter a so that H(BMS(cr)) = h. Here H(-) is the entropy 
functional defined in Section III-DI 

Let pz\x{z\x) denote the transition probability associated 
to a BMS channel c' and let Py \x(v\ x ) denote the transition 
probability of another BMS channel c. We then say that c' is 
degraded with respect to c if there exists a channel Pz\y( z \u) 
so that 

Pz\x{z\x) = ^2pY\x{y\x)p z \ Y (z\y)- 

y 

We will use the notation c -< c' to denote that c' is degraded 
wrt c (as a mnemonic think of c as the erasure probability of 
a BEC and replace -< with <). 

A useful characterization of degradation, see l62l Theorem 
4.74], is that c -< c' is equivalent to 

/(x)|c|(x)dx< / f(x)\c'\(x)dx (1) 
Jo 

for all /(x) that are non-increasing and concave on [0, 1]. 
Here, |c|(x) is the so called |D|-density associated to the L- 
density c, see ll62l p. 179]. In particular, this characterization 
implies that F(a) < F(b) for a -< b if F(-) is either the 
Battacharyya or the entropy functional. This is true since both 
are linear functionals of the distributions and their respective 
kernels in the \D\ -domain are decreasing and concave. An 
alternative characterization in terms of the cumulative distri- 
bution functions |<£|(x) and |£'|(x) is that for all z € [0, 1], 



i ,i 
|C|(x)dx< / |£'|(x)dx. 



(2) 



A BMS channel family {BMS(cr)}^ is said to be ordered 
by degradation if o\ < 02 implies c CTl -< c a2 . (The reverse 
order, o\ > <T2, is also allowed but we generally stick to the 
stated convention.) 

We say that an L-density c is symmetric if a(— y) = 
a(y)e~~ y . We recall that all densities which stem from BMS 
channels are symmetric, see ||62l Sections 4.1.4, 4.1.8 and 
4.1.9]. All densities which we consider are symmetric. We 
will therefore not mention symmetry explicitly in the sequel. 

A BMS channel family {c CT } is said to be smooth if for 
all continuously differentiable functions f(y) so that e y l 2 f(y) 
is bounded, the integral J f(y)c a (y) dy exists and is a con- 
tinuously differentiable function with respect to er, see 
Definition 4.32]. 



The three fundamental channel families {BEC(e)}J, 

{BSC(p)}J, and {BAWGNC(cr)}g° are all complete, ordered, 
smooth, and symmetric. 

C. MAP Decoder and MAP Threshold 

The bit maximum a posteriori (bit-MAP) decoder for bit i 
finds the value of x, which maximizes p(xi | ). It minimizes 
the bit error probability and is optimal in this sense. The 
block maximum a posteriori (block-MAP) decoder finds the 
codeword x™ which maximizes p(x" | y"). It minimizes the 
block error probability and is optimal in this sense. 

Definition 2 (MAP Threshold): Consider an ordered and 
complete channel family {c h }. The MAP threshold of the 
(di, <i r )-regular ensemble for this channel family is denoted 
by h MA? (d h d r ) and defined by 

inf{h G [0, 1] : Iiminf E[H(Xf | Y{ l (h))/n]>0}, 

n— ^oo 

where H(X" | F"(h.)) is the conditional entropy of the trans- 
mitted codeword X™, chosen uniformly at random from the 
code, given the received message Y^h) and where the ex- 
pectation E[-] is wrt the (di, d r ) -regular ensemble. ■ 
Discussion: Define P e , ? = Pr{^ ^ i^Y™)}, where X l (Y 1 l ) 
is the MAP estimate of bit i based on the observation Y". Note 
that by the Fano inequality we have H(Xi \ Y") < h2{P e ,i)- 
Assume that we are transmitting above h UAe (di,d r ) so that 
E[H(Xf I Y{ l )/n] > S > 00 Then 

1 n 1 n n 

/n(EPX) P «.*])>E[-£fe(Pe,i)] >E[Y,K(X i \Yr)/n] 

i=l i=l i=l 

>R\H.(X?\Y?)/n] >5>0. 

In words, if we are transmitting above the MAP threshold, then 
the ensemble average bit-error probability is lower bounded by 
h J 2 1 {8), a strictly positive constant. This ensemble is therefore 
not suitable for reliable transmission above this threshold. 

In general we cannot conclude from E[H(X" | Yi)/n] < 8 
that the average error probability is small@ 



D. Belief Propagation, Density Evolution, and Some Impor- 
tant Functionals 

In principle one can investigate the behavior of coupled 
ensembles under any message-passing algorithm. We limit 
our investigation to the analysis of the BP decoder, the most 

'We have K[H(Xf | Y-[ l )/n] > | liminf^oo ^E[H(X™ | K a n (h))] > 
for all n > n , lets say. Further, for 1 < n < n , E[H(Xf | Yf)/n] is 
strictly positive unless the channel is trivial. The claim follows by taking the 
minimum of all of the bounds for 1 < n < no as well as the bound for 
n > no. 

2 This is possible if we have the slightly stronger con- 
dition EE™ =1 H(Xi I y")/n] < 8. In this case 8 > 
;<Efc=iH(Xi|l?)] = ±nT,7=i®Yn[h2(min xP (x\Y?))]] > 
^nT,i = i^[2mm x p(x\Y^)]] = iEE? =1 2P e ,i], so that 
^ E EILi p e,i] < |<5- The last step in the previous chain of inequalities 
follows since under MAP decoding the error probability conditioned that we 
observed y" is equal to min x p(x \y" ). An alternative way to prove this 
is to realize that H(X; | Yj") represents a BMS channel with a particular 
entropy and to use extremes of information combining to find the worst error 
probability such a channel can have. The extremal channel in this case is the 
BEC. 
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powerful local message-passing algorithm. We are interested 
in the asymptotic performance of the BP decoder, i.e., the 
performance when the blocklength n tends to infinity. This 
asymptotic performance is characterized by the so-called den- 
sity evolution (DE) equation B31 . 

Definition 3 (Density Evolution): For £ > 1, the DE equa- 
tion for a (di, d r ) -regular ensemble is given by 

xe = c®(xf_ d f- 1 )® d '- 1 . 

Here, c is the L-density of the BMS channel over which 
transmission takes place and X( is the density emitted by 
variable nodes in the ^-th round of density evolution. Initially 
we have x = A , the delta function at 0. The operators © 
and ffl correspond to the convolution of densities at variable 
and check nodes, respectively, see l62l Section 4.1.4]. ■ 
As mentioned, all distributions associated to BMS channels 
are symmetric and symmetry is preserved under DE, see l62l 
Chapter 4] for details. There are a number of functionals of 
densities are of interest to us. The most important functionals 
are the Battacharyya, the entropy, and the error probability 
functional. For a density a these are denoted by 25(a), H(a), 
and <£(a), respectively. Assuming a is an L-density, they are 
given by 

58(a) = J a(j/)e-^ 2 dy, H(a) = J a(y) log^l + e^) dy, 
g( a ) = lj a(y)e-^ 2+ ^ 2 l>dy. 

We end this section with the following useful fact. The proof 
can be found in Appendix lAl 

Lemma 4 (Entropy versus Battacharyya): For any L- 
density a, 03 2 (a) < H(a) < 03(a). 

E. Extremes of Information Combining and the Duality Rule 

When we are operating on BMS channels, the quantities 
appearing in the DE equations are distributions. These are hard 
to track analytically in general, unless we are transmitting over 
the BEC. Often we only need bounds. In these cases extremes 
of information combining ideas are handy, see Il99l - lll03l . l62l 
p. 242]. 

Lemma 5 (Extremes of Information Combining): Let F(-) 
denote either H(-) or 03(-) and let a € [0, 1]. Let a BE c and aesc 
denote L-densities from the families {BEC(e)} and {BSC(p)}, 
respectively, so that L(a BEC ) = L(aBsc) = a- Then for any b, 

(i) min a:F ( a)=Q F(a © b) = L(a BEC © b) 

(ii) max a:F(a)=Q F(a © b) = F(a BSC © b) 

(iii) min a:F(a)=Q F(a ®b) = F(a BSC B b) 

(iv) max a:F(a)=Q F(a ffl b) = F(a BEC ffl b) 

Discussion: Although the extremes of information combining 
bounds are only stated for pairs of distributions, they naturally 
extend to more than two distributions. E.g., we claim that 
min a:F(a)=Q F(a® d ) = F(a BEC ) d = a d . To see this, let {a 4 }ti 
be any set of distributions so that F(ai) = a. Then we can 
use Lemma |5] repeatedly to conclude that 

^(ai © (©! 2 a*)) > ^(a BEC © (©l 2 a*)) 

= F(a 2 © (a BEC © (®f =3 ai)) 



> L(a BEC © (a BEC © (©f =3 a.;)) 



> L(a BEC © (a**" 1 )) = a d 



The same remark and the same proof technique applies to the 
other cases. 

Lemma 6 (Duality Rule - fi62\ p. 196]): For any a and b 
H(a © b) + H(a ffl b) = H(a) + H(b). 

Note: We give a simple proof of this identity at the end of the 
proof of Lemma [53] 

F. Fixed Points, Convergence, and BP Threshold 

We say that the density x is a fixed point (FP) of DE for 
the (di, d r )-regular ensemble and the channel c if 

x=c©(x ffld '- 1 )® <i! - 1 . (3) 

More succinctly, when the underlying ensemble is understood 
from the context, we say that (c, x) is a FP. 

One way to generate a FP is to initialize xo with Ao and to 
run DE, as stated in Definition [3] We call such a FP a FP of 
forward DE. The resulting FPs are the "natural" FPs since they 
have a natural operational meaning - if we pick sufficiently 
long ensembles, these are the FPs which we can observe in 
simulations when we run the BP decoder. 

Definition 7 (Weak Convergence): We say that a sequence 
of distributions {a^} converges weakly to a limit distribution 
a if for the corresponding cumulative distributions in the 
|Z?|-domain, call them {21^}, for all bounded and continuous 
functions f{x) on [0, 1] we have 

/(aOdiaiiCar) = f f(x)d\*\(x). 
'o Jo 

An equivalent definition is that |2l|j(a;) converges to |2l|(x) at 
points of continuity of x. ■ 

A simple proof of the following lemma can be found at the 
end of Section IlLll 

Lemma 8 (Convergence of Forward DE - /62j Lemma 4.75]): 
The sequence {x^} of distributions of forward DE converges 
weakly to a symmetric distribution. 

Lemma 9 (BP Threshold): Consider an ordered and com- 
plete channel family {c a }. Let x^(cr) denote the distribution 
in the £-th round of DE when the channel is c a . Then the BP 
threshold of the [di, d r ) -regular ensemble is defined as 

a w {di,d r ) = sup{er : x e (a) A +oc }. 

In other words, the BP threshold is characterized by the largest 
channel parameter so that the forward DE FP is trivial. 

We have just seen that the FPs of forward DE are important 
since they characterize the BP threshold. But there exist FPs 
that cannot be achieved this way. Let us review a general 
method of constructing FPs. Assume that, given a channel 
family {c^}, we need a FP x which has a given error 
probability £(x), entropy H(x), or Battacharyya parameter 
03 (x). Such FPs can often be constructed, or at least their 
existence can be guaranteed, by a procedure introduced in 1741 . 
Let us recall this procedure for the case of fixed entropy. 



lim 

i— f oo 
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Consider a smooth, complete, and ordered family {c h } and 
the (di, <i r )-regular ensemble. Let us denote by T h the ordinary 
density evolution operator at fixed channel Ch- Formally, 



T h (a) = c h © (a^- 1 ) 



(4) 



For any e G [0, 1], we define the density evolution operator at 
fixed entropy e, call it R e , as 



-Re (a) — 7h(a,e)(a), 



(5) 



where h(a,e) is the solution of H(Xh(a)) = e. Whenever 
no such value of h exists, R e (a) is left undefined. Since, 
for a given a, the family T h (a) is ordered by degradation, 
H(T h (a)) is a non-decreasing function of h. As a conse- 
quence the equation H(T h (a)) = e cannot have more than 
a single solution. Furthermore, by the smoothness of the 
channel family c h , H(T h (a)) is continuous as a function of 
h. Notice that H(To(a)) = 0: if the channel is noiseless 
the output density at a variable nodes is noiseless as well. 
Therefore, a necessary and sufficient condition for a solution 
h(a,e) to exist (when the family {ch} is complete) is that 
H(Ti(a)) = H((a ffld '- 1 )® d '- 1 ) > e (see Theorem 6 in ll74l ). 

Definition 10 (DE at Fixed Entropy e): Set ao = c e . For 
I > compute a^ +i = R e (ag). ■ 
Discussion: It can be shown that if the above procedure gives 
rise to an infinite sequence, i.e., if R e (-) is well-defined at 
each step, then this sequence has a converging subsequence. 
In fact, in practice one observes that the sequence itself 
converges. The computation of the convolutions is typically 
done numerically either by sampling or via Fourier transforms 
as in ordinary density evolution. Due to the monotonicity 
of H(Th(a£)) in h, the value of h(a^,e) can be efficiently 
found by a bisection method. The procedure is halted when 
some convergence criterion is met - e.g., one can require that 
(a properly defined) distance between ai and a£ + i becomes 
smaller than a threshold. 

Any FP of the above transformation R e , i.e., any a such 
that a = i? e (a), is also a FP of ordinary density evolution for 
the channel Ch with h = h(a,e). Furthermore, if a sequence 
of densities such that a£ + i = R e (ae) converges (weakly) to a 
density a, then a is a FP of R e , with entropy e. 

G. BP Threshold for Large Degrees 

What happens to the BP threshold when we fix the design 
rate r = 1 — di/d r and increase the degrees? The proof of the 
following lemma, which uses basic extremes of information 
combining arguments, can be found in Appendix 151 

Lemma 11 (Upper Bound on BP Threshold): Consider 
transmission over an ordered and complete family {c h } of 
BMS channels using an (di, d r ) -regular dd and BP decoding. 
Let r = 1 — 4 s - be the design rate and let h. B? (di,d r ) denote 
the BP threshold. Then, 



h BP (d;A) < 



t) 



l-((l-rK)e- 



In particular, by increasing d r while keeping the rate r fixed, 
the BP threshold converges to 0. 



H. The Wasserstein Metric: Definition and Basic Properties 

In the sequel we will often need to measure how close var- 
ious distributions are. Sometimes it is convenient to compare 
their entropy or their Battacharyya constant. But sometimes a 
more general measure is required. The Wasserstein metric is 
our measure of choice. 

Definition 12 (Wasserstein Metric - M04\ Chapter 6]): 
Let |o| and |b| denote two \D\ -distributions. The Wasserstein 
metric, denoted by d(|a|, |b|), is defined as 



d(|o|,|b|)= sup 

/(z)eLi P (i)[o,i] 



f(x)(\a\(x)-\b\(x))dx 



(6) 



where Lip(l)[0, 1] denotes the class of Lipschitz continuous 
functions on [0, 1] with Lipschitz constant 1. ■ 
Discussion: In the sequel we will say that a function f(x) is 
Lip(c) as a shorthand to mean that it is Lipschitz continuous 
with constant c. If we want to emphasize the domain, then we 
write e.g., Lip(c)[0, 1]. Why have we defined the metric in the 
| D | -domain? As the next lemma shows, convergence in this 
metric implies weak convergence. Since all the distributions 
of interest are symmetric, it suffices to look at the \D\ -domain 
instead of the Z?-domain. To ease our notation, however, we 
will formally write expressions like d(a, b), i.e., we will allow 
the arguments to be e.g. L-distributions. It is then implied 
that the metric is determined using the equivalent \D\ -domain 
representations as defined above. 

Lemma 13 (Basic Properties of the Wasserstein Metric): 
In the following, a, b, c, and d denote L-distributions. 

In the \D\ domain we have the following expressions for 
03(a) and H(a) (compare this to the expressions in the L 
domain given in Section III-Db , 



<B(\a\ 



yl — x 2 \a\(x)dx, 



H(\a\) = h 



1 — X 



a\(x)dx, 

where h,2(x) = —x log 2 x — (1 — x) log 2 (l — x) is the binary 
entropy function. See II 1 041 , 11051 for more details on metrics 
for probability measures, 
(i) Alternative Definitions: 



inf E[\X 

p(x,y)'-p(x)~\a\;p{y)~\b\ 



d(a,b) 

d(a,b)= / ||a|(:r)-|!B|(aO|cLc. 
Jo 



Y\], 



(ii) Boundedness: d(a, b) < 1. 

(iii) Metrizable and Weak Convergence: The Wasserstein met- 
ric induces the weak topology on the space of probability 
measures on [0, 1]. In other words, the space of proba- 
bility measures under the weak topology is metrizable 
and convergence in the Wasserstein metric is equivalent 
to weak convergence (see 11041 Theorem 6.9]). 

(iv) Polish Space: The space of probability distributions on 
[0, 1] metrized by the Wasserstein distance is a complete 
separable metric space, i.e., a Polish space, and any mea- 
sure can be approximated by a sequence of probability 
measures with finite support, i.e., distributions of the form 
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J2i=i c i$( x - x i)> where Yn=i °i = !> Ci > 0, and 
Xi e [0,1]. Further, the space is compact. (See [104 
Theorem 6.18].) 

(v) Convexity: Let a E [0, 1]. Then 

d(aa + ab, ac + ad) < ad(a, c) + ad(b 7 d). 
In general, if J2i a i = 1' tnen 

di^^aia^y^aibi) < y^a^a^, b^). 

(vi) Regularity wrt ©: The Wasserstein metric satisfies the 
regularity property d(a © c, b © c) < 2d(a, b), so that 

d(a © c, b © d) < d(a © c, b © c) + d(b © c, b © d) 

< 2d(a,b) +2d(c,d), 

and for i > 2 and any distribution c, d(a® l ©c, b® l ©c) < 
2id{a, b). 

(vii) Regularity wrt EH: The Wasserstein metric satis- 
fies the regularity property d(a EE c, b ffl c) < 

d(a, b)^Jl- Q3 2 (c) < d(a, b), so that 

d(a I c, b I d) < d(a i c, b H c) + d(b Sc,bid) 

< d(a,b) + d(c,d). 

Further, 

d(a ffl \ b H< )<d(a, b) ^(l- < B 2 (a)) i ?(l- s B 2 (b))^ 1 . 

i=i 

(viii) Regularity wrt DE: Let T c ( ) denote the DE operator for 
the dd (d h d r ) and the channel c. Then d(T c (a),T c (b)) < 
ac?(a, b), with 

d,.-l 

a = 2(d l 1) £(l-23 2 (a))^(l-23 2 (b))^. 

(ix) Wassersfe/n Bounds Battacharyya and Entropy: 

1 95(a)- 53(b)] < v/d(a,b)v/2-d(a,b), 
|H(a)-H(b)|</, 2 (^) 

< ^2V^b)V2-d(a,b). 

(x) Battacharyya Sometimes Bounds Wasserstein: 

d(A ,a) < ^1- 93(a) 2 < ^2(1 - 23(a)), 
rf(A +oc ,a) < 05(a). 

Discussion: Perhaps the most useful property of the Wasser- 
stein metric is that it interacts nicely with the operations of 
variable- and check-node convolution. This is the essence 
of properties (fvil . (tviib . and dviiit . For example, it is easy 
to see why property jviiib might be useful: Given that two 
distributions a and b are close, it asserts that after one iteration 
of DE these two distributions are again close. Indeed, as we 
will see shortly, depending on the Battacharyya parameter of 
the starting distributions the distance might in fact become 
smaller, i.e., we might have a contraction. 



I. Wasserstein Metric and Degradation 

When densities ordered by degradation, some the Wasser- 
stein metric inherits some additional properties. 

Lemma 14 (Wasserstein Metric and Degradation): In the 
following a and b denote i-distributions. 

(i) Wasserstein versus Degradation: Let a -< b. Let |2t| and 
|93| denote the corresponding |D|-domain cdfs. Define 
D(a,b) = J^xQ^Kx) - |2t|(a;))dx. Note that D(a, b) 
can be seen as a measure of how much b is degraded 
wrt a since it is the average of the non-negative integrals 
f*(\<B\(x) - \"Ql\(x))dx (cf. ©). Then 

D(a,b) > d 2 (a,b)/4. 

Furthermore, D(a, b) < 1 and for any symmetric densi- 
ties such that a -<; b -< c, D(a, c) = D(a, b) + D(b, c). 

(ii) Entropy and Battacharyya Bound Wasserstein Distance: 
Let a -< b. Then 

d(a, b) < 2v/(ln2)(H(b) - H(a)) < 2^/ ( 3{b) - 33(a) 

and 23(b) - 23(a) < ^2(H(b) - H(a)) . 

(iii) Continuity for Ordered Families: Consider a smooth 
family of i-distributions {c CT }^ ordered by degradation 
so that Q3(-) is continuous wrt a G [a_, a]. Then the 
Wasserstein metric is also continuous in a. 

Discussion: Property (01 is particularly useful. Imagine a 
sequence of distributions {ai}f =Q ordered by degradation, 
i.e., ao -< ai -< • • • -< a n . Then ao -< a n and we know 
from El that L»(a ,a„) = z(\2l\ n - |2l| )dz is non- 
negative since it is the "average" of the non-negative integrals 
r^(|2t|„ - |2t|o)dz. Now note that £>(•,•) is additive and 
that D(a 0l a n ) < 1. From these two facts we can conclude 
that there must exist an index i, < i < n — 1, so that 
Z?(a;, a;+i) < -. More generally, we can conclude for any 
1 < k < n that there must exist an index i, < i < n — k, 
so that D{a,i,a l+k ) < mm{ n _ k k+1 ,l} < ^. This follows 
by upper bounding the average of all these n — k + 1 such 
distances. By property (0) this implies "closeness" also in the 
Wasserstein sense. In words, in a sequence of distributions or- 
dered by degradation we are always able to find a subsequence 
of distributions which are "close" in the Wasserstein sense. 

As an exercise in using the basic properties of the Wasser- 
stein distance, let us give a proof of Lemma [8] 

Proof: Since we are considering a sequence of dis- 
tributions obtained by forward DE, we have >- x^ + i 
for £ > 0. Therefore, the quantities D(xi,X£ + i) are non- 
negative and they are additive in the sense that _D(xo,x„) = 
J2"=o D(xe,X( + i). Further, £>(■,■) is upper bounded by 1. 
It follows that {xg} forms a Cauchy sequence wrt to D(-,-) 
and hence also wrt d(-, •). This in turn implies that {x^} 
converges wrt d(-, •) and this convergence is equivalent to 
weak convergence. Finally, symmetry can be tested in terms 
of bounded continuous functionals and weak convergence 
preserves such functionals. ■ 

/. GEXIT Curve 

As we have discussed in the preceding section, FPs of DE 
play a crucial role in the asymptotic analysis. E.g., the BP 
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threshold is characterized by the existence/non-existence of a 
non-trivial FP of forward DE for a particular channel. 

An even more powerful picture arises if instead of looking 
at a single FP at a time we visualize a whole collection of FPs. 
In order to visualize many FPs at the same time it is convenient 
to project them. E.g., given the FP pair (c, x) we might decide 
to plot the point (H(c),H(x)) in the two-dimensional unit box 
[0,1] x [0,1]. 

Example 15 (BP EXIT Curve for BEC): Note that for the 
BEC, erasure probability is equal to Battacharyya parameter, 
and also equal to entropy. Even though all these parameters 
are equal in this case, our language will reflect that we are 
plotting entropy. 

Rather than plotting x itself it is convenient to plot the EXIT 
value (1 — (1 — x) dr ~ 1 ) dl . This is the locally best estimate 
of a bit based on the internal messages only, excluding the 
direct observation. For this choice the resulting curve is usually 
called the BP EXIT curve, see |fl"06l , |[T07l and |62] Sections 
3.14 and 4.10]. It is the BP EXIT curve since the estimate 
is a BP estimate. And it is the BP EXIT (where the E stands 
for "extrinsic") curve since the estimate excludes the received 
value associated to this bit. 

The FP equation is x = e(l — (1 — x dr ~ 1 )) dl ~ 1 , which we 
can solve for e to get 



e(x) 



(1 - (1 - X dr-l))d, 



(7) 



Using ^} we can write down the parametric characterization 
of the BP EXIT curve 



This curve is shown in the left-hand side in Figure Q] for the 
(3, 6)-regular ensemble and has a typical C shape. In fact, 
one can show that, in this case, for e < e BF (di,d r ) (the BP 
threshold) there is only one FP at x = corresponding to 
perfect decoding; for e ~ e BP (<i;, d r ) there are 2 FPs, one is at 
x = and the other is the FP corresponding to forward DE; 
and for e > e Be (di,d r ) there are exactly 3 FPs of DE, one 
of the FPs is at x = and the remaining two FPs are strictly 
positive, one of which is stable, denoted by x s (e), whereas the 
other is unstable, denoted by x u (e). The stable FP is the FP 
which is reached by forward DE. For details see Lemma [59] 
A quantity which will appear throughout this paper is the 
value of the unstable FP when transmitting over BEC(e = 
1). We denote this FP by x u (l). More precisely, x u (l) is the 
smaller non-zero solution of x = (1 — (1 — x) dr ~ 1 ) d '~ 1 . Note 
that x u (l) depends on the degrees, but we drop it from the 
notation for ease of exposition. 

Discussion: The above example raises the following two 
questions. (1) We have a large degree of freedom in selecting 
the projection operator. Which one is "best"? (2) From the 
above example we see that the set of FPs forms a smooth 
curve. Indeed, for the BEC it is not hard to see that the only 
FPs are the ones on the curve together with all the FPs of 
the form (c c , A +oc ), where c c is any element of the family of 
BEC channels and A +oc corresponds to erasure value of 0. Is 
this picture still valid for general channel families? 




r 



0.2 0.4 0.6 0.8 e 
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Fig. 1. Left: The BP EXIT curve of the (d; = 3, d r = 6)-regular ensemble 
when transmitting over the BEC. The curve has a characteristic "C" shape. 
Right: The construction of the MAP threshold from the BP EXIT curve. The 
dark gray area is equal to the design rate of the code. 



In the remainder of this section we address the first question, 
i.e., we will discuss a particularly effective choice of the 
projection operator. In the next section we will address the 
question of the existence and nature of this curve for the 
general case, presenting some partial results. 

A good choice for the projection operator for general 
channels is the GEXIT functional 1741 . For the BEC this 
coincides with the EXIT functional that we saw in Example[T31 
For the general case take a FP (c a , x CT ) and define y ~ xf^^ 1 . 



Then 



G(c CT ,y® d ') 



,®di 



£H(c ff ) 



where we think of y as fixed with respect to a. In words, 
G(c a , •) measures the ratio of the change in entropy of z a © 
y©di ( tne enn - py f the decision of any variable node under 
BP decoding) versus the change of entropy of the channel c a 
as a function of a. 

Discussion: Note that if the parameterization in a is Lips- 
chitz, i.e., if for some positive constant a, |H(co- 2 ) — H^o-JI < 
a\o<i — o\\, then the derivative ^Hf^o-) exists almost every- 
where. Further, in this case also H(ccr ©y® di ) is Lipschitz and 
hence differentiable almost everywhere. This follows since by 
(the Duality Rule in) Lemma [6] for 02 > 01, 

y® d ')-H(c ai ©y® d <)] 

+ [H(c CT2 fflv® d 0-H(c CTl ffly® d ')] 

= [H(C^)-H(C <7I )]<Q!|(72-<T1| ) 



[H(c 



where the last step on the right-hand side assumes that the 
parameterization is such that H(c CT ) increases in a. The claim 
follows since both terms on the left are non-negative (due 
to degradation), so that in particular the first term is upper 
bounded by a\(T2 — a-y\, i.e., it is Lipschitz. This formulation 
also shows that the numerator is no larger than the denominator 
(so that the ratio exists) and that the GEXIT value is upper 
bounded by 1 (and is non-negative). 

We get the GEXIT curve by plotting (H(c CT ), G(c a , y® dl )) 
for a family of FPs {c^x^}. This is shown in Figure [2] for 
the (3, 6)-regular ensemble assuming that transmission takes 
place over the BAWGNC. In the last section we have already 
explained how we can construct in the general case FPs by 
a numerical procedure. To plot Figure [2] we have used this 
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procedure to get a complete family of FPs for all entropies 
from to 1. In each of the two pictures of Figure [2] there is 
a small black dot. This dot marks a particular FP and the two 
small inlets show the corresponding distribution of the channel 
c CT as well as the message distribution emitted at the variable 
nodes, call it x CT . For a detailed discussion we refer the reader 
to J62), J74). 





H(c CT ) 



Fig. 2. The BP GEXIT curve for the (d; = 3, d r = 6)-regular ensemble 
and transmission over the BAWGNC. Each point on the curve corresponds to 
a FP (c CT , Xo- ) of DE. The two figures show the FP density x as well as the 
input density c CT for two points on the curves (see inlets). 



Why do we use this particular representation? As we will 
discuss in detail in Section III-LI assuming this curve indeed 
exists and is "smooth", the area which is enclosed by it is 
equal to r = 1 — di/d r , the design rate of the ensemble. 

This is easy to see for the BEC. To simplify notation, denote 
the GEXIT value in this case by G(e,y dl ), where e is the 
erasure probability, x is the FP for this channel parameter, 
and y = 1 - (1 - 1)*" -1 . We then have G{e lV dl ) = (1- (1 - 
Let us integrate the area which is enclosed by this 



\d r ~l\d. 



curve. We call the corresponding integral the GEXIT integral. 
For our particular case it is given by 



c)*- 1 )* de 



{l-(l-x) dr - l ) dl ^{x) dx 



e(x)(l-(l 
-4(dr-l) / 



l\d l 



dx 



=1 - di(d r - 1) / x(l-x) 
Jo 



d r -2 



d.r 



1 + d t x(l - x)^- 1 \l-di / (1 - x)^- 1 dx = 1 



Perhaps surprisingly, the result stays valid for general channels 
as we will discuss in Section ITl-LI This property is one of the 
main ingredients in our proof. 

Note that given Ch and Zh, the GEXIT functional G(ch,Zh) 
can be expressed in the form J z h (w)/(h, w)dw, where 
/(h, w) is called as the GEXIT kernel. In the |£>|-domain this 
kernel is given by 



1 dc h (z 



dh 



E 

i,j=±l 



log 2 1 



(l-iz)(l-jw) 
(l+iz)(l+jw) 



-k(z,w) 



For a proof of the following see Lemma 4.77, |62l . 

Lemma 16 (GEXIT for Smooth and Ordered Channels): 
For a smooth, ordered, channel family {c h } h , f(h,w), 
as a function of w, exists, is continuous, non-negative, 
non-increasing and concave on its entire domain. Further 
f(h,0) = 1 and /(h,l) = 0. 

We remark that the above lemma also holds when {ch} is 
piece-wise linear. 

K. Existence of GEXIT Curve 

As we briefly discussed above, for the BEC it is trivial to 
see that the BP GEXIT curve indeed exists. But for general 
BMS channels this is not immediate. The aim of this section 
is to show the existence of the BP GEXIT curve for at least 
a subset of parameters. 

Let us first recall the following lemma which was stated and 
proved in a slightly weaker form in 111081 . For the convenience 
of the reader we reproduce the proof in Appendix [E] 

Lemma 17 (Sufficient Condition for Continuity): Assume 
that communication takes place over an ordered and complete 
family {c h } h , where h = H(c h ), using the dd pair (di, d r ). 

Then, for any h g [0, 1], there exists at most one density x h 
so that (c h ,x h ) forms a FP which fulfills 



2\d r 



B(c h )(di - l)(dr - 1)(1 - 53(x h r) 



< 1 



(9) 



Furthermore, if such a density x h exists, then it coincides with 
the forward DE FP. Finally, 93(Xh) is Lipschitz continuous 
with respect to Q3(c h ). More precisely, if two FPs (ch^x^) 
and (c h2 , x h2 ) satisfy the condition *B (c hi ) (di — 1) (d r — 1) (1 — 
^(xnj 2 )^^ 2 < 1 - S for some 6 > 0, then 



<B(x hl )-»(x h2 )| < i|Q3(c hl )-Q3(c h2 



(10) 



The following lemma states that, at least for sufficiently 
large entropies, the BP GEXIT curve indeed exists and is well 
behaved. 

Lemma 18 (Continuity For Large Entropies): Assume that 
communication takes place over an ordered and complete 
family {ch}h, where h = H(c h ), using the dd pair (di,d r ). 
Consider the set of FP pairs {(ch,x h )} obtained by applying 
forward DE to each channel c h . Let 

a(x) = (1 - (1 - s)*- 1 )*- 1 , 

b(x) = (d l - l) 2 (d r - l) 2 x(l - x) 2 ^- 2 \ 

c(x) = \/ x/a(x). 

Let x be the unique solution in (0, 1] of the equation 



a(x) - b(x) = 0. 



Then 



the 



(11) 
with 



(8) 



family {(c h , Xh)}^^^, 
h(di, d r , {ch}) = h.BMs(c(i)), satisfies (O, is Lipschitz 
continuous wit to the Battacharyya parameter of the 
channel, where Iibms(') I s trie function which maps the 
Battacharyya constant of an element of the family to the 
corresponding entropy. Further, S(x h ) > a; u (l) > for all 

h>h(cM r ,{c h }0 
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del 


- 


d. r 


— JiBEC 


r 

^BAWGNC 


fiBSC 




(3, 4) 


0.5479 


0.75 


0.8156 


0.7544 


0.7428 


0.8254 


(6, 8) 


0.4107 


0.75 


0.6822 


0.5971 


0.5694 


0.6958 


(9, 12) 


0.3277 


0.75 


0.6024 


0.5097 


0.4719 


0.6185 


(12, 16) 


0.2752 


0.75 


0.5483 


0.4530 


0.4087 


0.5658 


(3, 6) 


0.3805 


0.5 


0.6787 


0.5931 


0.5651 


0.7010 


(4, 8) 


0.3512 


0.5 


0.6384 


0.5485 


0.5152 


0.6590 


(5, 10) 


0.3192 


0.5 


0.6022 


0.5094 


0.4717 


0.6229 


(6, 12) 


0.2916 


0.5 


0.5717 


0.4773 


0.4357 


0.5924 


(3, 12) 


0.2127 


0.25 


0.4970 


0.4012 


0.3513 


0.5335 


(4, 16) 


0.1957 


0.25 


0.4690 


0.3736 


0.3210 


0.5005 


(5, 20) 


0.1774 


0.25 


0.4426 


0.3481 


0.2933 


0.4721 


(6, 24) 


0.1616 


0.25 


0.4200 


0.3267 


0.2702 


0.4483 


(7, 28) 


0.1483 


0.25 


0.4006 


0.3086 


0.2509 


0.4281 



TABLE I 

Top branches of GEXIT curves are Lipschitz continuous from 

INDICATED CHANNEL ENTROPY UNTIL 1. THE NUMBERS X, 23 = h B EC > 
hBAWGNC < AND "BSC ARE COMPUTED ACCORDING TO LEMMAlT"8l THE 
FINAL NUMBER h IS A UNIVERSAL UPPER BOUND, VALID FOR ALL BMS 
CHANNELS AND IT WAS COMPUTED ACCORDING TO LEMMaQ~9] 



Table [Q shows the resulting bounds for various regular dds 
and various channels. These bounds were computed as follows. 
For a fixed dd pair (di,d r ) we first computed x numerically. 
This is easy to do since we know that there is a unique 
solution of the equation a(x) — b(x) = in (0, 1]. Further, 
a(0) - 6(0) = 0, a'(0) - 6'(0) = -{di - lf{d r - l) 2 < 0, 
and o(l) — 6(1) = 1. We can therefore find this unique 
solution efficiently via bisection. Once i is found, we find 
the corresponding Battacharyya parameter of the channel by 
computing c{x). Finally, we can convert this into an entropy 
value via the appropriate function Iibms(-)- E.g. for the family 
of BSC channels we have h BSC (a;) = 6.2(5(1 — yl — x 2 )). 

Although it is easy and stable to compute the above lower 
bound on the entropy numerically, it will be convenient to 
have a universal and analytic such lower bound. This is 
accomplished in the following lemma, whose proof can be 
found in Appendix [E] 

Lemma 19 (Universal Bound on Continuity Region): 
Assume that communication takes place over an ordered 
and complete family {ch}h, where h = H(ch), using the 
dd pair (di, d r ) with d r > 4 and di > 3. Let a{x) be defined 
as in Lemma Q~8] Consider the set of FP pairs {(c h ,x h )} 
which is derived by applying forward DE to each channel Ch- 
Then the GEXIT curve associated to {(c^ x h )} h>E , where 

x = 1- ((d;-l)(d r -l))-^5, h= y/x/a(x), 

is Lipschitz continuous wrt the Battacharyya parameter of the 
channel. Also, h(di, d r , {ch}) < h, where h(di,dp {ch}) is the 

quantity introduced in Lemma [TU and h < 63 ^± , so that 

2 ~ (d r -2)* 

h tends to zero when d r tends to infinity. 

Table U lists these universal upper bounds h for all the dds. 

The following corollary follows immediately from 
Lemma \TT\ property (juji of Lemma [14] and property (O of 
Lemma [T3l 

Corollary 20 (Continuity of Entropy): Let {c h } be a 
smooth BMS channel family and let (ch,x h ) denote a forward 

3 Note that we have made the dependence on the channel family, {ch}, 
explicit in the notation of h(d[, d r , {ch}). 



DE FP pair with channel entropy h > h.(di,d r , {ch}), where 
ii(di, d r , {ch}) is the value defined in Lemma IT~8l Then for 
hi,h 2 > h(di, d r , {c h }) we have 



(M2)) 2 



2 

32* 



|H(x hl ) -H(x h2 )| 2 < d(x hl ,x h2 ) < 



. 1 2V2 

-7r* (Chi ' ChJ, -w 



(ln(2)|H(c hl )-H(c ta )|)i 



The proof of the following lemma can be found in Ap- 
pendix [F] 

Lemma 21 (Entropy Product Inequality): Given a and b, 



H(a © b) = f f 
Jo Jo 



JO 



\a\(x)\b\(y)k(x,y)dxdy 

|2t| ( x )\^\ (y)k xxyy (x, y)dxdy, 



where 



2, ,2 



2 1 + ix z y 
ln(2) (1 -~x 2 y 2 ) 3 '' 



and where the cumulative distributions |2t|(x) = J? |a|(z)dz, 
\^B\(x) = f* \b\(z)dz are used to define \SL\(x) = |2l|(z)dz 
and |S|(3;) = J |Q5|(z)d2 and the kernel k(x,y) is as given 
in (O. We claim that 
(i) Bound on Kernel: 



8 (i-x 2 )-i(i-j/ 2 )-l. 



ln(2) 



(ii) Bound for Partially Degraded Case: Let a' be degraded 
with respect to the channel density a and let b' be such 
that d(b',b) < 5. Then 

H((a' - a) © (b' - b)) < JLj W - a)V25. 

(iii) Bound for Fully Degraded Case: Let a' be degraded with 
respect to the channel density a and let b' be degraded 
with respect to the channel density b. Then 



H((a'-a)©(b'-b)) < 



ln(2) 



s B(a'-a) s B(b'- b). 



Corollary 22 (Continuity of the BP GEXIT Curve): Let 
{c h } be a smooth BMS channel family and let (ch, x h ) denote a 
forward DE FP pair with channel entropy h. > ii(di, d r , {ch}), 
where h(di, d r , {ch}) is the value defined in Lemma [T8l 
Then, G(c h , (x^' 1 )®^ ) is continuous wrt to h. 
Proof: The GEXIT functional is defined as 

d 

G h = ^H(c h ' © z h ; 

We will find it more convenient to parameterize the densities 
using 6 = 6(h) = 5B(c h ). Let us define 

D{b',b) = -^U(c b ,®z b ). 

We claim that D(b' , 6) is continuous in both its arguments. 
Note that Gh = 7J(6(h), 6(h))^p and, correspondingly, we 
define Gt = D(b,b). To show continuity of D in the first 
component note that (Z?(6", 6) — D(b' , 6)) — >• by the smooth 
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channel family assumption. To show continuity of D in the 
second component consider H((q,«/ — Q,») © (z b r — z b )). By 
(the Entropy Product Inequality) Lemma [21] property dTTTb . we 
have 

|H((cfe/»- c 6 » )©(z 6 /-z 6 ))|<-^-|Q3 (ci/« - c 6 „ ) 1 1 03 {z b , - z 6 ) | 

in 2 

= J^\b"'-b"\\<B(z b ,-z b )\, 

from which we obtain 

\(D(b",b')-D(b",b))\<^L\<B(z b ,-z b )\, 

showing that D is actually Lipschitz in its second argument. 
It follows, in particular, that G b is continuous in b. Since 
the Battacharyya parameter is a bounded functional and the 
channel family is smooth, we have is continuous in h. 
Consequently, Gh is continuous in h. ■ 

L. Area Theorem 

In Section III- J I we introduced the GEXIT curve associated 
to a regular ensemble, see e.g. Figure [2] In Section III-KI 
we then derived conditions which guarantee that this curve 
indeed exists and is continuous in a given region. We will 
now discuss the GEXIT integral, the area under the GEXIT 
curve. In order to derive some properties of this integral, we 
will first introduce GEXIT integrals in a slightly more general 
form before we apply them to ensembles. 

Definition 23 (Basic GEXIT Integral): Given two families 
{c CT }^ and {z CT }^, the GEXIT integral {c^Zo-j^ is defined as 

G({ C(T ,z CT }2) = y ff H(^©z CT )da. 

■ 

Discussion: In the above definition, and some definitions be- 
low, we need regularity conditions to ensure that the integrals 
exist. Rather than stating some general conditions here, we 
will discuss and verify them in the specific cases. E.g., one 
case we will discuss is if the channel family c a is smooth and 
z a is a polynomial in a with "coefficients" which are fixed 
densities. 

Definition 24 (GEXIT Integral of Code): Consider a bi- 
nary linear code of length n whose graphical representation is 
a tree. Assume that we are given an ordered family of channels 
{c CT }^. Assume that when all variable nodes "see" the channel 
Co- the distribution of the resulting extrinsic BP message 
density at the i-th variable node is z ff ,,. Then the GEXIT 
integral associated of the i-th variable node is G({c<j, z^l^). 
■ 

Discussion: Note that the distribution z a j is the best guess 
we can make about bit i given the code constraints and all 
observations except the direct observation on bit i. This is why 
we have called the distribution the extrinsic message density. 
Note further that we have assumed that the graphical structure 
of the code is a tree. Therefore, BP equals MAP, the optimal 
such estimator. 

The GEXIT integral applied to an ensemble is just the 
integral under the GEXIT curve of this ensemble. 



Definition 25 (GEXIT Integral of Ensemble): Consider the 
(di, <i r )-regular ensemble and assume that {c a ,x a }° is a 
family of FPs of DE. Define y ff = x® dr_1 . Then 

G(d t , d r , {c CT , x a }Z) = f H(^ © y®*) da. 

■ 

In the sequel it will be handy to explicitly evaluate the integral. 
The proof of the following lemma is contained in Appendix iGl 

Lemma 26 (Evaluation of GEXIT Integral): Assume that 
communication takes place over an ordered, complete and 
piece-wise smooth family {ch}h, using the degree-distribution 
pair (di,d r ). Let {ch,x h } h be the FP family of forward DE. 
Set x = x h », h* > h(di, d r , {ch}), where h(d/, d r , {ch}) is the 
quantity introduced in Lemma [18] Then, 

G(di,d r ,{c h ,x h }l,) = 1-^-- A, 

where 

A = H(x) + (di - 1 - ^)H(x ffld ") - (di - l)H(x Hd -- 1 ). 

Discussion: Note that this GEXIT integral has a simple graphi- 
cal interpretation; it is the area under the GEXIT curve as e.g. 
shown in the right-hand picture of Figure [T] The condition 
h* > h.(di, d r , {ch}) ensures that this curve is well defined 
and integrable. 

We have seen in the last section that the value of a GEXIT 
integral of an ensemble is determined by the expression A. We 
will soon see that it is crucial to describe the region where A 
is negative. The following lemma, whose proof can be found 
in Appendix IH1 gives a characterization of this property. 

Lemma 27 (Negativity): Let (c, x) be an approximate FP of 
the (di, d r ) -regular ensemble of design rate r = 1 — di/d r . 
Assume that d r > 1 + 5(^^)3 and for some fixed < 5 < 
(^gf-) 2 , d(x,c© (xffl«*r-i)»*-i) < S. Let 

A = H(x) + (di - 1 - ^-)H(x ffld ") - (di - l)H(x Hd "- 1 ). 

For < k < i- e p if H(x) G [(|)^ + ia ^ TF ,£ - 

d , e -4(^-l)(TTSk:)* _ 4 then A < _ Ki 

Discussion: In words, for sufficiently high degrees, ^4(x) is 
strictly negative for all x with entropies in the range (0,di/d r ). 
Note that di/d r corresponds to the Shannon threshold for a 
code of rate l — di/d r . In the preceding lemma we introduced 
the notion of an approximate FP of DE: we say that (c, x) 
is a (5-approximate FP if for some S > we have d(x, c © 
(xffldr-i)®^-!) < s. 

M. Area Threshold 

The most important goal of this paper is to show that suit- 
able coupled ensembles achieve the capacity. The preceding 
(Negativity) Lemma [27] is an important tool for this purpose. 
But we will in fact prove a refined statement, namely we will 
determine the threshold for fixed dds. This threshold is the 
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,-1 ,1 


rate 


n 


Jk 




RAWGNC 


(5,6) 


0.1667 


0.8333 


0.8333 


0.8332 


0.8333 


(4,5) 


0.2 


0.8 


0.7997 


0.7992 


0.7994 


(3,4) 


0.25 


0.75 


0.7460 


0.7407 


0.7428 


(a a\ 
(4, 0) 


0.3333 


U.060 1 


0.6657 


0.6633 


0.6o4o 


(3,5) 


0.4 


0.6 


0.5910 


0.5772 


0.5841 


(3,6) 


0.5 


0.5 


0.4881 


0.4681 


0.4794 


(3,7) 


0.5714 


0.4286 


0.4154 


0.3912 


0.4057 


(3,8) 


0.6250 


0.3750 


0.3613 


0.3345 


0.3514 


(3,9) 


0.6667 


0.3333 


0.3196 


0.2912 


0.3099 



TABLE II 

Numerically computed area thresholds for some dds and 
channels. 



so-called area threshold and it was first introduced in f74l in 
the context of the Maxwell construction. 

Definition 28 (Area Threshold): Consider the (d;, (ir- 
regular ensemble and transmission over a complete and 
ordered channel family {ch}^ = o- For each h £ [0, 1], let x h 
be the forward DE FP associated to channel c h . The area 
threshold, denote it by ~h A (di, d r , {ch}), is defined as 

h A (d h d r ,{c h }) =sup{he [0,1] :A(x h ,d h d r ) <0}, 

where A(x h , di, d r ) is equal to A, which is given in Lemmal26l 
evaluated at the FP x h , when transmitting with the (di,d r )- 
regular ensemble. ■ 
Note that A(A +oa ,di,d r ) = and that x h = A +oc for 
all h < h BP (d;, d r , {ch}). Therefore the set over which we 
take the supremum is non-empty and h BP (<i;, d r , {c h }) < 
h A (di,d r , {ch}). Also note that we have made the dependence 
of the area threshold on the channel family and the dd 
explicit0 

Table HH gives some values for h. A (di, d r , {ch}) for various 
dds and channels. 

Recall that the GEXIT integral has a simple graphical 
interpretation - it is the area under the GEXIT curve, assuming 
of course that both the curve and the integral exist. The area 
threshold is therefore that channel parameter ii A (di 7 d r , {c h }) 
such that the GEXIT integral from ii A (di 7 d r , {c h }) to 1 is 
equal to 1 — 4*-, the design rate. 

Consider e.g. the case of the (10, 20)-regular dd depicted 
in Figure [3] From Lemma [19] we know that the GEXIT curve 




0.0 0.2 0.4 0.6 H(c a ) 



Fig. 3. The area threshold for the (10, 20)-regular ensemble and transmission 
over the BSC. We have h A ss 0.49985. For comparison, the BP threshold is 
at a channel entropy of roughly 0.2528. 

4 We keep the explicit notation of h(dj , d r , {ch}) and h y4 (d i , d r , {ch}) in 
the statements of the lemmas and theorems but drop it in the proof for ease 
of exposition. 



is Lipschitz continuous at least in the range h £ [0.341,1]. 
An explicit check shows that j4(x h=0 .34i) < 0, so that h A > 
0.341. We know that for h G [0.341.1] the expression 1 - 
^ — A(x h ) corresponds to the area under this GEXIT curve 
between h and 1. This expression is therefore a decreasing 
function in h, or equivalently, Afx^) is an increasing function 
in h. Using bisection, we can therefore efficiently find the area 
threshold and we get h A ss 0.49985. Note that for this case 
the area threshold has the interpretation as that unique channel 
parameter ii A so that the enclosed area under the GEXIT curve 
between h A and 1 is equal to 1 — This is obviously the 
reason for calling h. A the area threshold. 

The same interpretation applies to any dd (di,d r ) and any 
BMS channel where the area threshold h A (di,d r , {ch}) is such 
that the GEXIT curve from h. A (di, d r , {ch}) up till 1 exists 
and is integrable. Empirically this is true for all regular dds 
and all BMS channels. Consider e.g. the case of the (3, 6) 
ensemble and transmission over the BAWGNC, see Figure [4] 
From Table [] we are assured that this curve exists and is 
smooth at least in the range h £ [0.5931, 1]. This region is 
unfortunately too small. But it is easy to compute the curve 
numerically over the whole range. Since the resulting curve is 
smooth everywhere, it is easy to compute the area threshold 
numerically in this way. We get h A m 0.4792. 
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0.0 0.2 0.4 0.6 H(ca) 

Fig. 4. The area threshold for the (3, 6)-regular ensemble and transmission 
over the BAWGNC. This upper bound is given by the entropy value where 
the dark gray vertical line hits the x-axis. Numerically the upper bound is at 
a channel entropy of roughly 0.4792. For comparison, the BP threshold is at 
a channel entropy of roughly 0.4291. 

Fortunately, if we fix the rate then for all dd of sufficiently 
high degree this interpretation applies. 

Lemma 29 (Area Threshold Approaches Shannon): 
Consider a sequence of (di, <i r )-regular ensembles of fixed 
design rate r = 1 — di/d r and with di , d r tending to infinity. 

Assume that d r > 1 + 5(j-^) 3 and that h(di,d r , {ch}) < 

^--die-^- 1 ^^ 1 ^, where h(d/,d r ,{c h }) is defined in 
Lemma [T"8l Then for any BMS channel family {ch} 

£ - d,e-^- 1 )(^) § < h^A,{c h }) < * 

Furthermore, A(x h A,di,d r ) = and, for fixed rate 
and increasing degrees, the sequence of the area thresh- 
olds ii A (di, d r , {ch}) converges to the Shannon threshold 
h shannon (di, d r ) = J 1 = 1 — r universally over the whole class 
of BMS channel families. 

Proof: Note that h < h < eJ ^\ l d ^oo 0, where h is 

the universal upper bound on h in Lemma [19] Thus, h < ^- — 



14 



die ^ dr 1 ^ 'lie') 3 is fulfilled for sufficiently large degrees. 

Let us begin with the lower bound on h A . Consider any h < 
h < j^-die-^-Vt^rnr 1 ^ . Let x h be the corresponding BP 
FP. Clearly, H(x h ) < jfc - die'^- 1 ^^ 1 ^ . Suppose that 

H(x) £ [(l)^ +F ^ F ^-d ie - 4 ^- 1 ^) l - K ].Then 
from the (Negativity) Lemma [27] it follows that A(x h ) < 
and hence h A > f- - c^e -4 ^ -1 ^ 2 11T ^ . Now suppose that 

H(x h ) < (|)~^~+ 73-3ip (the left boundary in the Negativity 
lemma). Since h > h, we know from Corollary |20] that H(x h ) 
is a continuous function wrt h with H(x h= i) = 1. Thus, from 
the mean value theorem, there must exists a channel entropy 
h* such that H(x h » ) lies within the interval prescribed by the 
Negativity lemma. Therefore, also in this case h A > ^- — 

Let us now consider the upper bound. From above argu- 
ments, since h < h A , the BP GEXIT integral from lr 4 to 1 is 
given by Lemma [26] If we combine this with the definition 
of the area threshold, i.e., the expression A in Lemma [26] is 
non-positive at lr 4 , we get that the BP GEXIT integral at the 
area threshold is at least equal to 1 — 4 s -. Now, note that the 
BP GEXIT curve is always upper bounded by 1 and so the 
integral from lr 4 to 1 can be at most equal to 1 — h A . Putting 
things together we have that h A < h shannon = f-. 

Let us prove the last claim of the lemma. We want to show 
that at the area threshold A(x h A,di,d r ) = 0. Recall that the 
area threshold was defined as the supremum over all h so that 
j4(xh, di, d r ) is less than or equal to zero. Therefore, all we 
need to show is that A(xh,di,d r ) is continuous as a function 
of h around h A . 

Note that h. A is strictly larger than h. Thus, from Corol- 
lary [20] we conclude that the Wasserstein distance d(xh,x h A) 
is continuous wrt h. It is not hard to verify that A(x h , di, d r ) 
is also continuous wrt the Wasserstein distance. Combining, 
we get that A(x h , di 7 d r ) is continuous wrt h around h . ■ 



III. Coupled Systems 
A. Spatially Coupled Ensemble 

Our goal is to show that coupled ensembles can achieve 
capacity on general BMS channels. Let us recall the definition 
of an ensemble which is particularly suited for the purpose of 
analysis. We call it the (di,d r ,L,w) ensemble. This is the 
ensemble we use throughout the paper. For a quick historical 
review on some of the many variants see Section H-Bl 

The variable nodes of the ensemble are at positions [— L, L], 
L £ N. At each position there are M variable nodes, M £ 
N. Conceptually we think of the check nodes to be located 
at all integer positions from [—00,00]. Only some of these 
positions actually interact with the variable nodes. At each 
position there are j[-M check nodes. It remains to describe 
how the connections are chosen. We assume that each of the di 
connections of a variable node at position i is uniformly and 
independently chosen from the range [i, , . , , i + w — 1], where 
w is a "smoothing" parameter. In the same way, we assume 
that each of the d r connections of a check node at position i 



is independently chosen from the range [i — w + 1, . . . , i]. A 
detailed construction of this ensemble can be found in l53l . 

For the whole paper we will always be interested in the 
limit when M tends to infinity while L as well as di and d r 
stay fixed. In this limit we can analyze the system via density 
evolution, simplifying our task. 

Not surprisingly, spatially coupled ensembles inherit many 
of their properties from the underlying ensemble. Perhaps most 
importantly, the local connectivity is the same. Further, the 
design rate of the coupled ensemble is close to that of the 
original one. A proof of the following lemma can be found in 
El. 

Lemma 30 (Design Rate): The design rate of the ensemble 

(di, d r , L, w), with w < L, is given by 

R(d h d r ,L,u,) = (l- Tr )~ Tr • 

There is an entirely equivalent way of describing a spatially 
coupled ensemble in terms of a circular construction. This 
construction has the advantage that it is completely symmetric. 
This simplifies some of the ensuing proofs. 

Definition 31 (Circular Ensemble): Given an (di,d r ,L,w) 
ensemble we can associate to it a circular ensemble. This 
circular ensemble has w — 1 extra sections, all of whose 
variable nodes are set to zero. To be concrete, we assume that 
the sections are numbered from [— L, L + w — 1], where the 
sections in [—L,L] are the sections of the original ensemble 
and the sections in [L + 1, L + w — 1] are the extra sections. 
In this new circular ensemble all index calculations (for 
the connections) are done modulo 2L + w and indices are 
mapped to the range [— L, L + w — 1]. For all positions in the 
range i £ [L + 1, L + w — 1] the channel is c, = A +QO , 
and consequently, x; = A +oc . For all "regular" positions 
i £ [— L, L] the associated channel is the standard channel 
c. This circular ensemble has design rate equal to 1 — di/d r . 
* 

As we will see, it is the global structure which helps all the 
individual codes to perform so well - individually they can 
only achieve their BP threshold, but together they reach their 
MAP performance. 

B. Density Evolution for Coupled Ensemble 

Let us describe the DE equations for the (di,d r ,L,w) 
ensemble. In the sequel, densities are i-densities. Let c denote 
the channel and let x^ denote the density which is emitted 
by variable nodes at position i. Throughout the paper, A +oc 
denotes an L-density with all its mass at +00 and represents 
the perfect decoding density. Also, Ao denotes an L-density 
with all its mass at and represents a density with no 
information. 

Definition 32 (DE of the (di,d r ,L,w) Ensemble): Let x.;, 
i £ Z, denote the average L-density which is emitted by vari- 
able nodes at position i. For i £" [— L, L] we set x.; = A +oc . In 
words, the boundary variable nodes have perfect information. 
For i £ [— L, L], the FP condition implied by DE is 



w— 1 ^ w — 1 

-E(-E x >+^ 

j=0 k=0 



®di-l 



(12) 
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Define 

9 v^i-w-^-X i • • * ; ^M-w— 1 



^ it? — 1 ^ it? — 1 



(- 2^ x ^) J 

Note that g(x, . . . ,x) = (x^-i)©^-^ where the r i g ht-hand 
side represents DE (without the effect of the channel) for the 
underlying (di, d r )-regular ensemble. Also define 

^ w—1 _^ w — 1 
<?(>Wflr ■ -jXilw-l) = (— V] Xj. 

\ 7/1 * » V 7/1 * » 



W 1 — ' W 

j=0 fc=0 



As before we see that g(x,...,x) denotes the EXIT value 
of DE for the underlying (di, d r ) -regular ensemble. It is not 
hard to see ll62l that both g(xi- w +i, . . . ,Xi+ w -i) as well 
as g(x.i_ w+ i, . . . , Xi +W _i) are monotone wrt degradation in 
all their arguments Xj, j = i — w + 1, . . . , i + w — 1. 
More precisely, if we degrade any of the densities Xj, j = 
i — w + 1, . . . ,i + 1 — 1, then g(-) (respectively g(-)) is 
degraded. We say that g(-) (respectively <?(■)) is monotone 
in its arguments. ■ 
Lemma 33 (Sensitivity of DE): Fix the parameters (di,d r ) 
and To; and assume that d(ai, bi) < K, i = —w + 1, . . . , w — 1. 
Then 

d(c © g(a 

< 2(d ( - l)(d r - 1)k. 



Proof: For i G [0, to - 1], define a 4 = ^ XX=o a »-fc and 
b» = £ £fc=o bi-fc. Set c, = If d "- 1 and d 4 = bf d -" 1 . Then 
using properties (Jvji and flviit of Lemma [13] we see that 

(™} -0 

d(q,d 4 ) < (dr - l)d(a 4) b 4 ) < (d r - l)/s. 

Using once again property (JvJ of Lemma [T3l 

^ 7a; — 1 ^ tj; — 1 
rff-Vc-Vd,) < (dr-l)K. 

iu ' — ' w * — ' 
Finally, using property (|vij of Lemma [13] 

d(c © g(a- w+1 , a w -i),c © g(b- w+ i, b^-i)) 

= d(c © (- V q)®^" 1 , c © (- V d,)®^ 1 ) 

711 Z J 711 ^ J 



i=0 



i=0 



x(^+i) j s generated from x^^ by applying the DE equation 
(fT~2t to each section z e [— L,L], 



x p) =c ® 5(x f_) 



UJ + l ' ' 



S:+uj-1 ' 



We call this the parallel schedule. 

More generally, consider a schedule in which in step I an 
arbitrary subset of the sections is updated, constrained only by 
the fact that every section is updated in infinitely many steps. 
We call such a schedule admissible. We call xS^ the resulting 
sequence of constellations. ■ 

Lemma 36 (FPs of Forward DE): Consider forward DE 
for the (di,d r ,L,w) ensemble. Let x}^ denote the sequence 
of constellations under an admissible schedule. Then x' ) con- 
verges to a FP of DE, with each component being a symmetric 
L-density and this FP is independent of the schedule. In 
particular, it is equal to the FP of the parallel schedule. 

Proof: Consider first the parallel schedule. We claim that 
the vectors x^ are ordered, i.e., x/°) >- xS 1 ' y ■ ■ ■ y (the 
ordering is section-wise and is the vector of A +oc ). This is 
true since x'°) = (Ao, . . . , Ao), whereas x' 1 ' -< (c, . . . , c) -< 
(Ao,...,Ao) = xj-°\ It now follows by induction on the 
number of iterations and the monotonicity of the function 
g(-) that the sequence x^' is monotonically decreasing. More 
precisely, we have xjf -< xf\ Hence, from Lemma 4.75 



< 2(di-l)(d r -l)K. 



in 1621 . we conclude that each section converges to a limit 
density which is also symmetric. Call the limit xj-°°\ Since 
the DE equations are continuous it follows that x/°°' is a FP 
of DE (TT2T > with parameter c. We call x/ 00 ) the FP of forward 
DE. 

That the limit (exists in general and that it) does not depend 
on the schedule follows by standard arguments and we will 
be brief. The idea is that for any two admissible schedules the 
corresponding computation trees are nested. This means that 
if we look at the computation graph of schedule lets say 1 
at time £ then there exists a time £' so that the computation 
graph under schedule 2 is a superset of the first computation 
graph. To be able to come to this conclusion we have crucially 
used the fact that for an admissible schedule every section 
is updated infinitely often. This shows that the performance 
under schedule 2 is at least as good as the performance under 
schedule 1. Since the roles of the schedules are symmetric, 
the claim follows. ■ 



C. Fixed Points and Admissible Schedules 

Definition 34 (FPs of Density Evolution): Consider DE for 
the (di, d r , L, w) ensemble. Let x = (x_£, . . . ,Xi). We call x 
the constellation (of L-densities). We say that x forms a FP 
of DE with channel c if x fulfills (fT2l for i e [—L,L], As 
a short hand we say that (c,x) is a FP. We say that (c, x) is 
a non-trivial FP if x, 7^ A+oo for at least one i e [—L,L]. 
Again, for i <£ [-L, L], x t = A +oc . ■ 

Definition 35 (Forward DE and Admissible Schedules): 
Consider forward DE for the (di,d r , L,w) ensemble. More 
precisely, pick a channel c. Initialize x/°) = (A ,...,A ). 
Let x 1 -^ be the result of I rounds of DE. This means that 



D. Entropy, Error and Battacharyya Functionals for Coupled 
Ensemble 

Definition 37 (Entropy, Error, and Battacharyya): Let x be 
a constellation. Let F(-) denote either the H(-) (entropy), (£(•) 
(error probability), or QS(-) (Battacharyya) functional defined 
in Section IH-DI 

We define the (normalized) entropy , error and Battacharyya 
functionals of the constellation x to be 



1 



2L + 1 



i=-L 
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E. BP GEXIT Curve for Coupled Ensemble 

We now come to a key object, the BP GEXIT curve for 
the coupled ensemble. We have discussed how to compute BP 
GEXIT curves for uncoupled ensembles in Section IIII-EI For 
coupled ensembles the procedure is similar. 

In Section UlI-CI we have seen that for coupled systems FPs 
of forward DE are well defined and that they can be computed 
by applying a parallel schedule. This procedure allows us to 
compute some FPs. 

But we can also use DE at fixed entropy, as discussed 
in Section [ill to compute further FPs (in particular unstable 
ones). More, precisely, fix the desired average entropy of the 
constellation, call it h. Start with the initialization x'°) = A , 
the vector of all Ao. In each iteration proceed as follows. 
Perform one round of DE without incorporating the channel, 
i.e., set 



(0 



Now find a channel c a <G {c<j}, assuming it exists, so that after 
the convolution with this channel the average entropy of the 
constellation is equal to h. Continue this procedure until the 
constellation has converged (under some suitable metric). 

Assume that we have computed (via the above procedure) 
a complete family {c^x^} of FPs of DE, i.e., a family so 
that for each h € [0, 1], there exists a parameter a so that 



h = 2L 1 +1 2j=-z, Hfx^t). Then we can derive from it a BP 
GEXIT curve by projecting it onto 

L 

(X<T,?-^(0-Kl )' * *5 X<7,?+UJ— 1 J 



1 L 

{ h(Cct) '2lTT £ 



i=-L 



l))}, 



where g(-) was introduced in Section IIII-BI and 

2T+T 52i=— Z G(c ai g('*cr 1 i r ^uH-l,- ■ ., *-cr,Hw-l)) i s me 

(normalized) GEXIT function of the constellation x CT . 
Figure [5] shows the result of this numerical computation when 
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Fig. 
L = 



H{Ca) H( CrT ) 
BP GEXIT curves of the ensemble (a!; = 3, d r = 6, L) for 
, 8, 16, and 32 and transmission over the BAWGNC (left) and the 



BSC (right). The BP thresholds are h 



BAWGNC/BSC 



(3,6,4) = 0.4992/0.4878, 



BAWGNC/BSC 



(3,6, 16) 



h BAWGNC/BSc( 3 ' 6 ' 8 ) = 0.4850/0.47303, 

0.4849/0.4729, h™ WGNC/BSC (3, 6, 32) = 0.4849/0.4729. The light/dark 
gray areas mark the interior of the BP/MAP GEXIT function of the underlying 
(3, 6)-regular ensemble, respectively. 

transmission takes place over the BAWGNC (left-hand side) 
and the BSC (right-hand side). Note that the resulting curves 
look similar to the curves when transmission takes place over 
the BEC, see ||53l . For small values of L the curves are far to 
the right due to the significant rate loss that is incurred at the 



boundary. For L around 10 and above, the BP threshold of 
each ensemble is close to the area threshold of the underlying 
(3, 6) -regular ensemble, namely 0.4792 for the BAWGNC 
and 0.4680 for the BSC (see the values in Table E). The 
picture suggests that the threshold saturation effect which was 
shown analytically to hold for the BEC in 1741 also occurs 
for general BMS channels. 

The aim of this paper is to prove rigorously that the situation 
is indeed as indicated in Figure [5J i.e., that the BP threshold 
of coupled ensembles is essentially equal to the area threshold 
of the underlying uncoupled ensemble. 



F. Review for the BEC 

Let us briefly recall the main result of ll53l which deals 
with transmission over the BEC. Let e% EC (di,d r ,L,w) and 
etgx C (di,d r ,L, w) denote the BP threshold and the MAP 
threshold of the (di,d r , L, w) ensemble. Also, let e^ EC (di , d r ) 
denote the MAP threshold of the underlying (di, <i r )-regular 
LDPC ensemble. Then the main result of l53l states that 

lim lim e^ EC (di,d r ,L,w) = lim lira e^ c (di,d r ,L,w) 

w— fooZ— foo w— >ooL— yoo 

Also, (see l62l ) as di,d r — > oo, with the ratio di/d r 
fixed, e^ c (di,d r ) — > di/d r . Thus, with increasing degrees, 
(di,d r , L, w) ensembles under BP decoding achieve the Shan- 
non capacity for the BEC. 



G. First Result 

Before we state and prove our main result (namely that 
coupled codes can achieve capacity also for general BMS 
channels), let us first quickly discuss a simple argument which 
shows that spatial coupling of codes does have a non-trivial 
effect. 

First consider the uncoupled case. We have seen in 
Lemma [TT] that when we fix the design rate 1 — di/d r and 
increase the degrees the BP threshold converges to 0. What 
happens if we couple such ensembles? We know that for the 
BEC such ensembles achieve capacity. The next lemma asserts 
that this implies a non-trivial BP threshold also for general 
BMS channels. 

Lemma 38 (Lower Bound on Coupled BP Threshold): 
Consider transmission over an ordered and complete family 
{c h } of BMS channels using a (di,d r , L,w) ensemble and 
BP decoding. 

Let h BP = li BF (di, d r , L, w, {ch}) denote the corresponding 
BP threshold and let e BP = e ar (di,d r , L,w) denote the corre- 
sponding BP threshold for transmission over the BEC. Then 



c h BP (di,d r ,L,Mi,{c h }) ) ^ e B 



(13) 



In particular, for every 8 > there exists aioefj and a dd 
pair (di,d r ) with di/d r fixed, so that 

®( c li BP (d,,ii r ,Z,«),{ci,})) > di/d r - S. 
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Proof: Consider DE of the coupled ensemble (cf. (TT2b). 
Applying the Battacharyya functional, we get 

(Ul-l w-i \ d > _1 

^E^E^-O 8 ^ 1 ) , 
j=0 k=0 / 

(14) 

where we use the multiplicative property of the Battacharyya 
functional at the variable node side. 

Using the linearity of the Battacharyya functional and 
extremes of information combining bounds for the check node 
convolution ( l62l Chapter 4]) we get 

^ w — 1 ^ w — 1 d —1 

«(x J )<«(c h )(i--E( 1 --E ,B ( x ^))^ 1 ) ■ 

j=0 k=0 

(15) 

The preceding set of equations is formally equivalent to the 
DE equations for the same spatially coupled ensemble and 
the BEC. Therefore, if <B(c h ) < e w {d h d r , L, w) then the DE 
recursions, initialized with c h must converge to A +oc , which 
implies (Tl3] >. 

Further, from ll53l we know that for sufficiently large 
degrees (di,d r ), with their ratio fixed, and with w sufficiently 
large, e 1>r (di,d r , L,w) approaches di/d r arbitrarily closely 
(see the discussion in the preceding section), which proves 
the final claim. ■ 

Example 39 ((3,6) Ensemble and BSC(p)): Let us spe- 
cialize to the case of transmission over the BSC using (3,6)- 
regular ensemble. Then we have 58(c) = 2y/p(l — p). Using 
the above argument and solving for e in 2ye(l — e) > |, we 
conclude that by a proper choice of w and (di,d r ) we can 
transmit reliably at least up to an error probability of 0.067. 

Combining the above result with Lemma [4] we conclude 
that the BP threshold of the coupled ensemble is at least 
(di/d r ) 2 — 6. In summary, for general BMS channels and 
regular ensembles of fixed rate and increasing degrees, their 
uncoupled BP threshold tends to but their coupled BP 
threshold is lower bounded by a non-zero value. We conclude 
that coupling changes the performance in a fundamental way. 
In the rest of the paper we will strengthen the above result by 
showing that this non-zero value is in fact the area threshold 
of the underlying ensemble and as degrees become large, this 
will tend to the Shannon threshold, di/d r . 

IV. Main Results 

A. Admissible Parameters 

In the sequel we will impose some restrictions on the 
parameters. Rather than repeating these restrictions in each 
statement, we collect them once and for all and give them a 
name. 

Definition 40 (Admissible Parameters): Fix the design rate 
r of the uncoupled system. We say that the parameters (di,d r ) 
and w are admissible if the following conditions are fulfilled 
with r = 1 - § L : 

(i) d r > V3b\n(b), b= lnW i K1 _ r} , 



(ii) 2(d l - l)(d r ~ 1)(1 - c 2 )^ < 1, c = (1 - r)(l - 

(iii) h(di,d r ,{c h }) < (l-rXl-dre- 4 ^- 1 )^^) 1 )-^ 
where h(di, d r , {ch}) is the bound stated in Lemma [T8l 

(iv) w > 2dm, 

(v) w >2{d l -\)(d r ~l){^§^Y, 

(vi) w > 2{di - l)(d r - 1)^(4(^/2 + ^(4 - I))) 2 , 
We say that the ensemble (di,d r , L,w) is admissible if the 
parameters (di,d r ) and w are admissible. If we are only 
concerned about the conditions on (di,d r ), then we will say 
that (di,d r ) is admissible. ■ 
Discussion: Conditions (01, (In} and ( ITITb are fulfilled if we take 
the degrees sufficiently large. Conditions (|iyj, (|v), and ((vi} can 
all be fulfilled by picking a sufficiently large connection width 

UK 

Why do we impose these conditions? At several places we 
use simple extremes of information combining bounds and 
these bounds are loose and require, for the proof to work, the 
above conditions. We believe that with sufficient effort these 
bounds can be tightened and so the restrictions on the degrees 
can be removed or at least significantly loosened. We leave 
this as an interesting open problem. 

Numerical experiments indicated that for any 3 < di < d r 
and w > 2 the threshold saturation phenomenon happens, with 
a "wiggle-size" which vanishes exponentially in w. 

Note that the above bounds imply the following bounds 
which we will need at various places: 

(vii) d r > + 5^ ln(2(d r - l) 3 )), 

(viii) d r > 1 + 5(^)1 

Instead of condition (Imb above we can impose the stronger 
but somewhat easier to check condition h < (1 — r)(l — 

d r e~ 4: ( dr ~ 1 ^ iii -L, where h is the upper bound stated 

in Lemma [19] or even further strengthen the condition to 

< (1 - r)(l - dre- 4 ^- 1 ^-^^ ) - i. The last 

condition can be easily checked to be satisfied for sufficiently 
large degrees. 

B. Main Result 

Theorem 41 (BP Threshold of the (di,d r , L,w) Ensemble): 
Consider transmission over a complete, smooth, and ordered 
family of BMS channels, denote it by {ch}, using the 
admissible ensemble (di,d r ,L,w). Let h BP (dj, d r , L, w, {ch}) 
and h MAP (d/, d r , L, w, {ch}) denote the corresponding BP and 
MAP threshold. Further, let R(di,d r , L, w) denote the design 
rate of this ensemble and set r = 1 — di/d r . Finally, let 
h A (di, d r , {ch}) denote the area threshold of the underlying 
(di, <i r )-regular ensemble and the given channel family. Then 

h A (di,d r ,{c h })- f(d t ,d r ,w) 
<h" p (d I> d r ,L,«>,{Q l }) (16) 

<il MAe (d h d r ,L,W,{c h }) 

<h^,^{c h })+ ( "" 1) f " 1)3 , (17) 
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wher e f(d u d r ,w) = 8(d r - l) 3 (\/2 + ^di(d r - 

^))\f ^■ '~ 1 ^ dr ~ 1 ^. Note that f(di,d r ,w) depends only on 
the dd (di,d r ) and w but is universal wrt the channel family 
{c h }. Furthermore, 

lim lim R(di,d r ,L,w) = 1- (18) 

w— yoo L— ¥oo d r 

Discussion: 

(i) The bound h BP < h MAP is trivial and only listed for 
completeness. Consider the upper bound on h MAP stated 
in ill) . Start with the circular ensemble stated in Defi- 
nition [3JJ The original ensemble is recovered by setting 
the w — 1 consecutive positions in [L, L + w — 1] to 0. 
Define K = 2L + w. We first provide a lower bound on 
the conditional entropy for the circular ensemble when 
transmitting over a BMS channel with entropy h. We then 
show that setting w— 1 sections to does not significantly 
decrease this entropy. Overall this gives an upper bound 
on the MAP threshold of the coupled ensemble in terms 
of the area threshold of the underlying ensemble. 
It is not hard to see that the BP GEXIT curve is the same 
for both the (di, ci r )-regular ensemble and the circular 
ensemble (when all sections have the standard channel). 
Indeed, forward DE (see Definition [35] ) converges to 
the same FP for both ensembles. Consider the circular 
ensemble and let h € (h" 4 ,!]. The conditional entropy 
when transmitting over the BMS channel with entropy h 
is at least equal to 1 — di/d r minus the area under the 
BP EXIT curve of [h, 1] (see Theorem 3.120 in |62l ). 
Indeed, from the proof of Theorem 4.172 in J62), we 
have 

lim inf E[H(JT" | Y?{h))]/n > 1 - ^ - G({c h , x h }£). 

Note that the above integral, G({ch, x h }^) is evaluated 
at the BP FPs. From Lemmas \T9\ and the BP FP 
densities x h exist and the GEXIT integral is well-defined 
for all h > h A > h. 

Here, the entropy is normalized by n = KM, where K 
is the length of the circular ensemble and M denotes 
the number of variable nodes per section. Assume that 
we set w — 1 consecutive sections of the circular en- 
semble to in order to recover the original ensemble. 
As a consequence, we "remove" an entropy (degrees 
of freedom) of at most (w — 1)/K from the circular 
system. The remaining entropy is therefore positive (and 
hence we are above the MAP threshold of the cou- 
pled ensemble) as long as 1 — di/d r — (w — 1)/K — 
G({ch, Xh}h) > 0. From Lemmas l26l and l29l we have 
G({ch, Xh}^) = l—di/dr, so that the condition becomes 
G({c h ,x h }i A ) - G({c h ,x h }i) < (w- I)/ K. For all 
channels with h > h A we have G(c h ,x h ) > 2( - rf 3jp ■ 
For a derivation of this statement we refer the reader 
to the proof of part (vi) of Theorem [47] This implies 
that G({c h ,x h }^) > (h - h A )/(2(d r - l) 3 ). Further- 
more, G({c h ,x h }^) < G({c h ,x 11 }^ 1 ). This follows from 
the definition of area threshold, which implies that for 
h > h A , A(x h ,di,d r ) > (cf. Lemma l26t and then 



combining with Lemma [26] Putting things together we 
get 

h - h A 

G({c h ,x h }i A ) - G({c h ,x h }i) > 

We get the stated condition on h MAP by lower bounding 
K by 2L. 

(ii) The lower bound on h BP ((i;, d r , L, w, {ch}) expressed in 
( [Tol l is the main result of this paper. It shows that, up to 
a term which tends to zero when w tends to infinity, the 
BP threshold of the coupled ensemble is at least as large 
as the area threshold of the underlying ensemble. 
Empirical evidence suggests that the convergence speed 
wrt w is exponential. Our bound only guarantees a 
convergence speed of order yjl/w. 
Let us summarize. In order to prove Theorem|4T]we "only" 
have to prove the lower bound on h BP . Not surprisingly, this 
is also the most difficult to accomplish. The remainder of this 
paper is dedicated to this task. 

C. Extensions 

In Theorem [41] we start with a smooth, complete and 
ordered channel family. But it is straightforward to convert 
this theorem and to apply it directly to single channels or to a 
collection of channels. The next statement makes this precise. 

Corollary 42 ((di,d r ,L,w) Universally Achieves Capacity): 
The (di, d r , L,w) ensemble is universally capacity achieving 
for the class of BMS channels. More precisely, assume we 
are given e > and a target rate R. Let C(R) denote the set 
of BMS channels of capacity at least R. To each c € C(R) 
associate the family {ch}h = o> by defining 

c = fH(S)[( H ( c )- h ) A +oo+lic], 0<h<H(c), 
Ch lT3Hw[( h - H W)A + (l-h))c], H(c)<h<l. 

Then there exists a set of parameters (di,d r , L, w) so that 

R(dt,d r ,L,w) >R-4e, 
inf h BP (di,d r ,L,w,{c h }) > 1 - R + e. 

cGC(fi) 

Since for each c € C(R) the associated family {ch}h = o is 
ordered by degradation, this implies that we can transmit with 
this ensemble reliably over each of the channels in C(R) at a 
rate of at least R — 4e, i.e., arbitrarily close to the Shannon 
limit. 

Proof: Fix the ratio of the degrees so that R — 3e < 
1 - di/d r < R - 2e. Note that for each c G C(R) the 
constructed family {ch} is piece-wise smooth, ordered and 
complete. By applying Theorem [41] to each such channel 
family we conclude that for admissible parameters (i.e., as long 
as we choose the degrees and the connection width sufficiently 
large) the threshold of the ensemble (di,d r ,w,L) for the 
given channel family is at least h A (di,d r , {ch}) — f(di, d r ,w), 
where h (di , d r , {c^}) is the area threshold and f(di,d r ,w) 
is a universal quantity, i.e., a quantity which does not depend 
on the channel family and which converges to when w 
tends to infinity. Further, we know from Lemma [29] that 
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the area threshold h A (di, d r , {ch}) approaches the Shannon 
threshold uniformly over all BMS channels for increasing 
degrees. By our choice of (di,d r ) the Shannon threshold is 
1 — (1 — di/d r ) > 1 — R + 2e. Therefore, by first choosing 
sufficiently large degrees (di,d r ), and then a sufficiently large 
connection width w, we can ensure that the BP threshold 
is at least 1 — R + e. Finally, by choosing the constellation 
length L sufficiently large, we can ensure that the rate loss we 
incur with respect to the design rate the underlying ensemble 
is sufficiently small so that the design rate of the coupled 
ensemble is at least R — 4e. ■ 

Corollary 43 (Universally Capacity Achieving Codes): 
Assume we are given e > and a target rate R. Let C(R) 
denote the set of BMS channels of capacity at least R. Then 
there exists a set of parameters (di,d r ,L,w) of rate at least 
R — he with the following property. Let C(n) be an element 
of (di,d r ,L,w) with blocklength n, where we assume that n 
only goes over the subsequence of admissible values. Then 

n ^E a( „ )e ( rflidrjLjW )[l {suPcgc(fl)F BP (C(n)iC) < e} ] = 1. 

In words, almost all codes in (di, d r , L, w) of sufficient length 
are good for all channels in C(R). 

Proof: Note that according to (O in Lemma [T3l the space 
of \D\ distributions endowed with the Wasserstein metric is 
compact, and hence so is C(R). Hence there exists a finite 
set of channels, denote it by {ci}\=l-, so that each channel in 
C(R) is within a (Wasserstein) distance at most 5 from the set 
{c;}. We will fix the value of 5 shortly. 

Let us modify the set {q} so that C(R) is not only close to 
{q} but is also "dominated" by it. For each c € {q}, define 



my) = 



'VS+(l-VS)\<t\(y), < y < 
1, z*(\€\)<y<l, 



where z*(|£|) is the supremum of all z so that f (1 — 
|£|(y))dy = VS. If no such z e [0,1] exists then set 
= 0. We claim that for any a so that d(a, c) < S, a -< c. 
In other words we claim that J |2l|(y)dy < J \€\(y)dy for 
any z g [0, 1] (cf. ©). 

For z*(|£|) < z < 1, J^\€\(y)dy = 1 - z, the maxi- 
mum possible, and hence this integral is at least as large as 
J |2l|(y)dy. Consider therefore the range < z < z*(\<£\). 
In this case 

\€\(y)dy > VS(1 - z) + (1 - VS) [ \€\(y)dy 



(b) 



(c) 

> 



\£\(y)dy + VS (1 - \£\(y))dy > / \£\{y)dy + S 



\£\(y)dy 



||a|(y)-|C|(i/)|dy> / |2l|(y)dy. 



In (a) we use the definition of |<£|(y). To obtain (b) we use 
that for z < z*(|C|) we have £(1 - > J^ W) {1 - 

\<£\(y)) = VS. Finally, in (c) we use the alternative definition 
of the Wasserstein distance in Lemma [T3l 
Further, 

d(c,a) <d(c,c) + d(c,a) < f \\£\ - \£\(y)\dy + 5 

Jo 



< |V*(l-|C|(y))|dy + j ^l-\£\(y))dy + S<3V5. 

In words, any density a which was close to c is still close 
to c. We have therefore the set {ci}[l^ of channels which 
"cover" and "dominate" the set of channels C(R) in the sense 
that for every a g C(R) there exists an element c, g {ct}fc£i 
so that d(a,Cj) < ZVS and a -< £j. This implies in particular 
that mini 1 - H(£i) > R - h 2 {\VS) > R — e, where in the 
last step we use the relation between the Wasserstein distance 
and entropy given by (ix) in Lemma Qj] also we assumed that 
we fixed S so that /^(fv^) < e. In words, all channels in 
{£i}i=i have capacity at least R — e. 

From Corollary |42] we know that, given a finite set of 
channels from C(R — e), there exists a set of parameters 
(di,d r , L, w) which has rate at least R — be and BP threshold 
at least 1 — R + 2e universally for the whole family. Since 
each element of {ci} 1 ^ is an element of C(R — e) this 
ensemble "works" in particular for all channels {Cj}^* and 
these channels "dominate" all channels in C(R) in the sense 
that for element of c € C(R) there is an element of {q}^* 
which is degraded wrt c. 

For each element q we know by standard concentration 
theorems that "almost all" elements of the ensemble have a bit 
error rate of the BP decoder going to zero B31 . (62|. Since the 
"almost all" means all but an exponentially (in the blocklength) 
small subset and since we only have a finite number of channel 
families, this implies that almost all codes in the ensemble 
work for all the channels in the finite subset. But since the 
finite subset dominates all channels in C(R) this implies that 
almost all codes work for all channels in this set. ■ 

D. Proof of Main Result - Theorem [57] 

We start by proving some basic properties which any spatial 
FP has to fulfill. Since we are considering a symmetric 
ensemble (in terms of the spatial arrangement) it will be useful 
to consider "one-sided" FPs. 

Definition 44 (FPs of One-Sided DE): We say that x is a 
one-sided FP (of DE) with channel c if (fT2l is fulfilled for 
i g [-N, 0] with Xi = A +co for i < —N. We say that the FP 
has a free boundary condition if x; = xo for i > 0. We say 
that it has a forced boundary condition if x^ = Ao for i > 0. 
Lastly, we say that it has an increasing boundary condition if 
Xj_i -< Xi for i > 0, where x,;, for i > 1, are fixed but arbitrary 
symmetric densities. ■ 

Definition 45 (Proper One-Sided FPs): We say that x is 
non-decreasing if X; -< Xj+i for i = —N, . . . , — 1. Let (c,x) 
be a non-trivial and non-decreasing one-sided FP (with any 
boundary condition). As a short hand, we then say that (c, x) 
is a proper one-sided FP. Figure [6] shows an example. ■ 

Definition 46 ( One-Sided Forward DE and Schedules): 
Similar to Definition [35] one can define one-sided forward 
DE by initializing all sections with Ao and by applying DE 
according to an admissible schedule. ■ 

There are two key ingredients of the proof. The first 
ingredient is to show that any one-sided spatial FP which is 
increasing, "small" on the left, and "not too small" and "flat" 
on the right must have a channel parameter very close to the 
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Fig. 6. A proper one-sided FP (c, x) with free boundary condition for 
the ensemble (cZ; = 3,d r = 6,A r = 16, w = 3) and the channel 
c =BAWGNC(cr) with a = 1.03978. We have H(c) = 0.46940 and 
H(x) = 0.17. The height of the vertical bar at section i is equal to H(xi). 



area threshold h. A . This is made precise in (the Saturation) 
Theorem [47] 

The second key ingredient is to show the existence of a such 
a one-sided FP (c*,x*). Figure [7] shows a typical (two-sided) 
such example. This is accomplished in (the Existence) Theo- 
rem [48] Once these two theorems have been established, the 
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Fig. 7. Unimodal FP of the (d; = 3, d r = 6, L = 16, w = 3) ensemble 
for the BAWGNC(cr) with a = 0.9480 (channel entropy « 0.4789). The 
constellation has entropy equal to 0.2. The bottom figure plots the entropy of 
the density at each section. Notice the small values towards the boundary, a 
fast transition, and essentially constant values in the middle. The top figure 
shows the actual densities at sections ±12, ±8, ±4, 0. Notice that for densities 
towards the boundary the mass shifts towards the "right," indicating a high 
reliability. Also plotted in the middle figure relating to section is the BP 
forward DE density of the (3, 6)-regular ensemble at a = 0.9480. The density 
is right on the top of the density at section of the coupled-code ensemble, 
i.e., these two densities are visually indistinguishable. The density in section 
±4 is also "close" to the density at section 0. Thus in the flat part, the densities 
become close to the BP density of the underlying ensemble. 



proof of our main theorem is rather short and straightforward. 

Theorem 47 (Saturation): Fix r G (0, 1) and let (di, d r , w) 
be admissible, with r = 1 — 4 s -, in the sense of conditions {n]), 
difiV (jv), dvl]>, ( TviTb and (Iviiit of Definition [40] Let (c* , x* ) 
be a proper one-sided FP on [— N, 0], with forced boundary 
condition, so that for some 6 > 0, 2(w— 1) < L, and L + w < 
K < N the following conditions hold. 

(i) Constellation is close to A+oo "on the left": 



03 (x! 



N+L. 



< 6. 



(ii) Constellation is not too small "on the right": 

%(x*-k) > Xu(l). 

Then 

|H(c*) — h A (di,d r , {c h })| < c(di,d r , S, w, K, L). 

Here c(di, d r , 5, w, K, L) is a function which can be made ar- 
bitrarily small by choosing 5 sufficiently small, w sufficiently 
large, and L and K sufficiently large compared to w. (This 



implies of course that the constellation length N is also chosen 
sufficiently large.) More precisely, 



f(di,dr,' 



lim lim c(di,d r ,S,w,K,L) 

5->0 L,K->oo 



In I V w 

The proof of Theorem H7J can be found in Appendix [I] The 
proof of the following Theoreml48lis contained in ApendixlKl 

Theorem 48 (Existence of FP): Fix r <G (0,1) and let 
(di,d r ,w) be admissible in the sense of conditions (01, ©, 
(fllTb . divb . ©, (JviJ in Definition [40] with r = I - f-. Let 
{c CT }i =0 be a smooth, ordered and complete channel family. 

In the sequel, N(di,d r) w) is a positive constant which 
depends on the ensemble but not the channel or the channel 
family and c(di,d r ) is a positive constant which depends on 
di and d r , but not on the channel c, the channel family, N or 
w. 

For any N > N(di, d r , w) and < 5 < , there exists 
a proper one-sided FP (c*,x*) on [— N, 0] with parameters 
(di,d r ,w) and with forced boundary condition so that the 
following conditions are fulfilled: 

(i) Constellation is close to A. +oc "on the left": Let 

wc(dud r )- 



Nx-{N+1){-- , N 1)s y 



(N + 1)5 

Then Q3(x|) < 5 for i e [-N, -N + Ni - 1}. 
(ii) Constellation is not too small "on the right": Let 

W2 = (W+1)( M1)_^ ; 

Then Q3(x|) > x u (l) for i e [-7V 2 ,0]. 
Discussion: In words, the theorem says that for any fixed 
w € N and <5 > 0, if we pick TV sufficiently large, we can 
construct a FP constellation which is small on the left for a 
linear fraction of the total length and reasonably large on the 
right, also for a linear fraction of the total length. 

Proof of Theorem Wl\ We are ready to prove the remaining 
statement of our main theorem, i.e., ( fT6b . Let (di,d r ) and w 
be admissible in the sense of conditions (01, ([[j}, dull) , dry} , (jvj, 
(fvTb in Definition |40l and set r = 1 — 4 s - . We want to show that 

h» > - 8(d r - lf(V2+ ^dtidr - lW 2 ^'" 1 ^" 11 " 

First note that h BP is a decreasing function of L. This follows 
by comparing DE for two constellations of increasing size and 
verifying that DE of the larger constellation "dominates" (in 
the sense of degradation) DE of the smaller constellation. In 
the ensuing arguments we will take advantage of this fact - if 
we can lower bound the threshold for a particular constellation 
size then we will have automatically lower bounded also the 
threshold for all smaller constellation sizes. This is convenient 
since at several steps we will need to pick L "sufficiently" 
large, where the restrictions on the constellation size stem from 
our use of simple extremes of information combining bounds. 

Choose a channel, call it c, from the channel family 
{c h } with H(c) < h A - 8{d r - l) 3 (V2 + ^di{d r - 
^ / 2(d,-iX dr-i2 We wil] show that for 

any admissible 

ensemble (di,d r , L, w), where L is chosen "sufficiently large," 
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the forward DE process converges to the trivial FP. By our 
remarks above concerning the monotonicity of the threshold 
in terms of L, this implies that for any length L, DE converges 
to the trivial FP, hence proving our main statement. 

As stated in Theorem |47j f(di,d r ,w) is the limit of 
c(di,d r ,6,w,K, L) when first L and K tend to infinity and 
then 8 tends to zero. We claim that, for the fixed parameters 
(di,d r , w), for any 5 > there exist L,K,N £ N, sufficiently 
large, so that 



N(di,d r ,w) < N, 
2(w — 1)<L, 

L < (iV+ i)(l_^41; i 

- v '\2 (N+1)5J' 



(19) 
(20) 

(21) 



L + w < K < {N + 1)(^- W ;} dl \ dr }) < N—L, (22) 



V 4 



(N+l)5 



H(c) < h A - c(di,d r , S, w, K, L), 



(23) 



where N(di,d r ,w) and c(di,d r ) are the constants given in 
Theorem [48] To fulfill d23l , as discussed in Theorem |47j 
c(di,d r ,5,w,K,L) is a continuous function in its param- 
eters which converges to 8(d r — l) 3 (-\/2 + -^.d^dr — 
^ 7 2(^-1X ^-1) if we let ^ tend tQ and let K and L 

tend to infinity. Therefore, by choosing 5 sufficiently small, 
and L and K sufficiently large we fulfill (T23l . By a proper 
such choice we also fulfill ( f20T > and the first inequality of 
( l22l . Now note that increasing iV loosens all above conditions. 
In particular, for any 5 > and K,L,w £ N, by choosing 
iV sufficiently large we fulfill ( fT9l ), (fJTJ, and the last two 
inequalities of (T22l . We have now fixed all parameters. 

Let (c*,x*) be the proper one-sided FP on [— N, 0] whose 
existence is promised by Theorem [48] Recall that it has a 
forced boundary condition, i.e., it is a FP if we assume 
that x* = A for i > 0. Furthermore, from (f2Tb and (F22l . 
and since (c*,x*) is a proper one-sided FP, we satisfy the 
conditions of Theorem [47] Thus we conclude that H(c*) > 
h A — c(di, d r , S, w, K 7 L). 

Next, create from the FP (c*,x*) on [— iV, 0] the constel- 
lation x on [— N, N] by appending to x*, N densities Ao on 
the right which are part of the constellation and by defining 
x; = Ao for i > N (forced boundary condition). Note that 
this redefined constellation (c*,x) is not a FP since it does 
not fulfill the FP equations for the positions i £ [1,N]. 

Initialize DE with x, i.e., set x/°) = x. Apply forward DE 
to x with the channel c as chosen previously (cf. (1231). Call 
the resulting constellation, after i steps of DE, x' ). 

We claim that for all I > 0, x^- 1 is spatially monotonically 
increasing, i.e., -< x^+u for all i £ [-N, N], and that 
is monotonically decreasing as a function of I, i.e., x^ +1 ) ~< 
xW. 

To prove the first claim recall that x(°) = x, which is 
monotonically increasing and has forced boundary condition 
on the right. But DE preserves the monotonicity so that for 
every i > 0, xf ' ~< x|f ls for all i £ [-N, N]. 

Consider now the second claim. Assume we run one step of 
DE on x(°) with the channel c*. Then for i £ [-N, 0], x^ 



In words, for each i £ [-N, N] the constellation is decreasing. 
It is therefore also decreasing if we run one step of DE with 
the channel c -< c* . As a consequence, since DE preserves the 
order imposed by degradation, we must have x^ +1 ) -< x« for 
all I > 0. Thus the process must converge to a FP of DE with 
forced boundary condition. Call this resulting FP x(°°) . 

We claim that Q5(xj°?' 1 ) < x a (l). Assume to the contrary 
that this is not true. Then we can apply Theorem [47] to 
(c,x < -°°' ) ) to arrive at a contradiction. Let us discuss this 



point in detail. Since xj -< x^ +1 for all I we must have 
'{ for all i £ [-N, N]. Combined with the fact 



(0 



/ (oo) 
that x 



-< x, 
(00)' 



(00) 



1+ 



Ao for i > N, we conclude that (c,x/°°)) 
is a proper one-sided FP on [— N, N] with forced boundary 
condition. Furthermore, from $1% , d20b , (l2Tb and d22b we 
see that x' 00 ) satisfies all hypotheses of Theorem [47] More 
precisely, by assumption the constellation is large for the last 
N — L sections. Hence from the choice of K as given by 
(l22l we must have 03 (x^*^-) > x u (l). From OTT i it is clear 
that Q3(xi^ , L ) < 8. As a consequence, from the Theorem [471 
we conclude that H(c) > h A — c(di,d r ,5,w,K,L). But this 
contradicts our initial assumption on H(c) (cf. d23l). 

We are now ready to prove our main claim. Consider a 
coupled ensemble on [1,L + 1] with parameters (di,d r ,w). 
More precisely, the coupled ensemble has sections from [1, L+ 
1] with i £ [1,L + 1] set to A+oo. Initialize all sections in 
[1,L + 1] to Ao. Call this constellation y(°). Run forward 
DE with the channel c on y(°', call the result {y' and let 



,(00) (jgjjotg me limit, which is a FP. We have 



i £ [1, L + 1], since y ? - 



(0) 



-< x, 
(o) 



for i £ [1,L + 1] and y\ ' = 
A +00 -< x t (0) for i<£ [1, L + 1] and DE preserves the ordering. 
Therefore 23(y 4 (oo) ) < ^(x^) < x u (l), for all i £ [1, L + l}. 
Let Q3j, for some j £ [1,L + 1], denote the maximum of 
the Battacharyya parameter over all sections of y(°°). From 
extremes of information combining we have 



«Bj = S^ 00 ') < Q3(c)(l - (1 - «8 i ) tfr " 1 ) 
< (1 - (1 - JBj-)*" -1 )* -1 - 

The last inequality implies that = since ^Sj £ [x u (l), 1] 
is excluded. From property © of Lemma [T3] we conclude that 

d(yH, A +OQ ) < Xiy^) < 5Bj - 0, for all * £ [1,L+ 1]. 
In other words, y(°°) = A^, as claimed. 



d r -l\d,-l 



x' "* by construction. For i £ [l,iV], xj -< c* -< Aq 



,(0) 



E. Conclusion and Outlook 

We have shown one can construct low-complexity coding 
schemes which are universal for the whole class of BMS 
channels by spatially coupling regular LDPC ensembles. Thus, 
we resolve a long-standing open problem of whether there 
exist low-density parity-check ensembles which are capacity- 
achieving using BP decoding. These ensembles are not only 
attractive in an asymptotic setting but also for applications and 
standards since they can easily be designed to have both, good 
thresholds and low error floors. In addition, these ensembles 
are universal in the sense that one and the same ensemble is 
good for the whole class of BMS channels, assuming that the 
channel is known at the receiver. In fact, we have shown the 
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stronger statement that almost all codes in such an ensemble 
are good for all channels in this class. 

Let us discuss some open questions. 

Maxwell Conjecture: As a byproduct of our proof, we 
know that the MAP threshold of coupled ensembles is 
essentially equal to the area threshold of the uncoupled 
ensemble. In addition we know that the MAP threshold 
of the uncoupled ensemble is also upper bounded by the 
area threshold. The Maxwell conjecture states that in fact 
the MAP threshold of the uncoupled ensemble is equal 
to the area threshold. So if one can establish that the 
MAP threshold of the uncoupled ensemble is at least 
as large as the MAP threshold of the coupled ensemble, 
then the Maxwell conjectured would be proved. A natural 
approach to resolve this issue is to use interpolation 
techniques and it is likely that the Maxwell conjecture 
can be proved in a way similar as this was done in 1931 
for other graphical models. 

Convergence Speed: As discussed previously, we only 
give weak bounds on the speed of convergence of the 
ensemble to the Shannon capacity (as a function of the de- 
grees, the constellation length L, as well as the coupling 
width w). Numerical evidence suggests much stronger 
results. Settling the question of the actual convergence 
speed is both challenging and interesting. 
Lifting of Restrictions: Our results apply only to suf- 
ficiently large degrees whereas numerical calculations 
indicate that the threshold saturation effect equally shows 
up for small degrees. This is a consequence of the fact 
that at many places we have used simple extremes of 
information combining bounds. With sufficient effort it is 
likely that one can extend the proof to many dds which 
are currently not covered by our statement. 
General Ensembles: In a similar vein, we restricted 
our investigation to regular ensembles to keep things 
simple, but the same technique applies in principle also to 
irregular or even structured ensembles. Again, depending 
on the structure of the underlying ensemble, much effort 
might be required to derived the necessary bounds. 
Wiggle Size: Perhaps the weakest link in our derivation is 
the treatment of the connection width w. In our current 
statements this connection width has to be chosen large. 
Empirically, small such lengths, such as the extreme case 
w = 2 give already excellent results and by increasing w 
the convergence to the area threshold seems to happen 
exponentially fast. How to derive practically relevant 
bounds for such small values of w is an important open 
problem. 

Scaling: More generally, from a practical point of view, 
what is needed is a firm understanding of how the 
performance of such codes scale in each of the parameters 
in di, d r , L, M, as well as w. Only then will it be possible 
to design codes in a principled fashion. 
Practical Issues: Further important topics are, the design 
of good termination schemes which mitigate the rate- 
loss, a systematic investigation of how structure in the 
interconnection pattern as well as the codes influences the 



performance, and how to optimally choose the scheduling 
(e.g., windowed decoding) to control the complexity of 
the decoder 1751 . 

General Models: As was discussed briefly in the intro- 
duction, the threshold saturation phenomenon has been 
empirically found to hold in a large variety of systems. 
This suggests that one should be able to formulate a 
rather general theory rather than finding a separate proof 
for each of these cases. For all one-dimensional systems 
this has recently been accomplished in 01091 . For higher- 
dimensional or infinite-dimensional systems this is a 
challenging open problem. 
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Appendix A 
Entropy versus Battacharyya - Lemma|4] 

Lemma 49 (Bounds on Binary Entropy Function): Let 
h 2 {x) = — xlog 2 (x) — (1 — x)log 2 (l — x). Then for 
x € [0, 1/2], 

h 2 (x) > 1 - (1 - 2xf , (24) 
h 2 (x) < 2y/x(l-x), (25) 

. / \ 11 3 

B2 X < ~~rX 4 . 
4 

Proof: To prove d24"l i. write 

, , , fTTol Lemma 11.11 . 1 (1 — 2x) 2n 

Mz) = ~~ oTTo 



(26) 



21n2 ^ n(2n - 1) 

n=l y ' 

> l-(l-2x) 2 — !— V —r-^- = 1 - (1 - 2xY 

K ' 21n2 ^ n(2n - 1 V ' 

n=l y * 



Consider now d25"l l. Set g(z) — 2^f (1 — x)x — 

h2{x)\ x =(i- z )/2 = Vl-z 2 - Mnr)- We want t0 
show that g(z) > for z G [0, 1]. We have 



i . ri + z 

in 



VT - 1* 21n2 \l-zJ' 
1 1 



(1-22)3/2 (l_ z 2) ln2 - 

The following claims are straightforward to verify using the 
explicit formulae for g(z), g'(z), and g"(z): (i) g(0) = g(l) = 
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0, (ii) g'(0) = 0, (iii) g"(0) > 0, (iv) g"(z) = has exactly 
one solution in [0, 1]. 

Suppose there exists a w, < w < 1, so that g(w) < 0. 
Then from (i), (ii) and (iii) we must have g(z) = for at least 
three distinct elements of [0, 1]. Rolle's theorem then implies 
that g'(z) — has at least two distinct solutions in (0, 1) 
and hence at least three distinct solutions in [0, 1] (since by 
(i) g'(0) = 0). Using Rolle's theorem again, this implies that 
g"(z) = has at least two solutions in [0,1], contradiction 
(iv). 

We prove d26i i along similar lines. Consider g(x) = jii — 



l W2 + x l°§2( a; )' where a = 4(1 + ln( 



l lln(2 ) 



)) ps 1.035 > 1. 



Note that g{x) < ^x% — h 2 (x) for x G [0, \] (to verify this, 
upper bound the term —(1 — x)log 2 (l — x) of the entropy 
function by a;/ln(2)). So if we can prove that g(x) > for 
x G [0, h] then we are done. 

Direct inspections of the quantities shows that g(0) = 0, 
g'{0+) = +oo, g(x*) = g'(x*) = 0, where x* = 
0.05157, and ff (I)>0. 
It follows that if there exists an x G [0, |] so that g(x) < 
then g(x) must have at least 4 roots in this range, therefore 
by Rolle g'(x) must have at least 3 roots, and again by Rolle 
g"{x) must have at least 2 roots. But an explicit check shows 
that g"{x) = - 7 p 5 - + ^j^y = 0. So g"{x) = can only 



have a single solution. 

Proof of Lemma^ Let |a| denote the density in the \D\ 
domain. Then 



Jensen f , 

> / Vl-2 2 |a|(^)dz = lB(|o|). 
Jo 

This proves that Q3(|a|) 2 lower bounds H(|a|). For the upper 
bound we have 



®(|a|)= / v / l-2 2 |a|( 2 )dz 
Jo 

= f (^T?-h2^- )) |a|(z)dz + H(|a| 



>Q by (25) with x = ^ 



Appendix B 
Upper Bound on BP Threshold - LemmaQT] 

Proof: We use ideas from extremes of information com- 
bining. We get an upper bound on the BP threshold by 
assuming that the densities at check nodes are from the BSC 
family and that densities at variable nodes are from the BEC 
family. 

Let x represent the entropy of the variable-to-check message 
and let c denote the entropy of the channel. If for any x G [0, c] 



h 2 ((l - (1 - 2^ 1 (x)) d "- 1 )/2) > (x/c) — 



(27) 



then DE will not converge to the perfect decoding FP. The 
left-hand side represents the minimum entropy at the output 
of a check node which we can get if the input entropy is 



x (and this minimum is achieved if the input density is from 
the BSC family). The right-hand side represents the maximum 
input entropy which we can have at the input of a variable node 
if we want an output entropy equal to x (and this minimum 
is achieved if the input density is from the BEC family). Note 
that we can extend the inequality (|27| | to all x G [0, 1] without 
changing the condition since for x G (c, 1], the right hand side 
is strictly bigger than 1, whereas the left-hand side is always 
bounded above by 1. 

The preceding condition is equivalent to saying that in order 
for DE to succeed, we must have 

< x 

C " (ft 2 ((l-(l-2^ 1 (x))^-i)/2))*-i' 
for all x G [0, 1]. We can also write this as 

C " (/i 2 ((l-(l-2a0«*r-i)/2))*-i' 

where x G [0, i]. 

We want to show that c cannot be too large, i.e., we are 
looking for an upper bound on c. Note that any value of x 
gives a bound. Let us choose x = ^= _ 1 . This gives the 
bound 

h 2 (-, 1 ^ 



c < 



l-e" 



-)) d ' 



To obtain the above inequality we first write (1 — 2x) dr ~ 1 as 
exp((e? r — 1) log(l — 2a;)). For x G [0, ^] we use the Taylor 
expansion 



Iog(l-2i) 



-2a;- 



(2a;) 2 (2a;) s 



.. < -2a; 



1 



Thus exp((d r — 1) log(l— 2a;)) < exp(— \fd r — 1) and h 2 {(\ — 
(1 - 2a;) d '- 1 )/2) > fe 2 ( 1 ~ e "^ 7 ^ ). We want to simplify the 
expression even further. Using 11 101 Lemma II. 1] and bringing 
out the first term in the summation, 

00 r. \2n 



h 2 {x) = 1 - 



1 



> 1 



1 



21n2 
1 

21n2 
1 

21n2 
2 



(1 - 2a;) 2 
(1 - 2a;) 2 
(1 - 2a;) 2 



In 9 2^ „( 



2xf 



2 In 2 ^ n(2n - 1) 



1 



2 In 2 
(1 



£(l-2a;) 2 " 

n=2 

2a;) 4 ' 



21n2 



£((l-2a;) 2 r 



n=0 



{X- 



1 



8(x-l/2) 4 



In 2 

Substituting x = (1 



2' ln(2)(l-4(a;- 1/2) 2 )' 
; -Vdr-l)/2 we have 



(28) 



\dt-l 



>1- 



21n2 

(dl - 1) 
2 In 2 



21n2 



l-e-2V3^' 
„-4 v / d7-T 



1- 



We conclude that 



c < 



1- 



2 In 2 



-2Vrf^T- 



,-2,/d r -l 



< 



^(273=1) 



1 - die-' 2 ^ 1 
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Appendix C 

Basic Properties of the Wasserstein Metric - 
Lemma [T"3l 

Proof: 

(i) Alternative Definitions: The equivalence of the basic 
definition (cf. Definition ITZb and the first alternative 
description is shown in (6.2) and (6.3) in 111041 . The 
equivalence of the first and second alternative descriptions 
is shown in 111 111 . 

(ii) Boundedness: Follows directly from either of the two 
alternative descriptions. 

(iii) Metrizable and Weak Convergence: See II 1041 Theorem 
6.9]. 

(iv) Polish Space: See |[T04l Theorem 6.18]. 

(v) Convexity: We have 



/(x)(a|a|(x) + a|b|(x) - a|c|(x) - a\V\(x)) dx 



< 



<f(x)(\a\(x)-\c\(x))dx+a //(x)(|b|(x)-|5|(x))dx 



o 



(vi) Regularity wrt ©: Let /(•) be Lip(l)[0, 1]. Without loss 
of generality assume that /(0) = 0. Indeed, since we 
consider the difference of densities, subtracting a constant 
does not affect the integral. Define /(x) for x € [—1, 1] 
by setting /(x) = /(x) for x £ [0, 1] and /(x) = /(— x) 
for x € [—1,0]. Then f(x) is Lip(l)[— 1,1] and also 
/(0) = 0. 

Let = a © c and e = b © c be the D-domain 
representation. Thus d(D, e) is characterized by 



f(z)(\9\{z)-\t\(z))dz 



(i 



(iii) 



f{z)(l>(z)-e(z))dz 

-l 

/ (a{x)c(y) - b(x)c(y))f(g(x, y)) dxdy 
i J-i 

\c\(y)dy f (\a\(x)-\b\(x))h(x,y)dx 
o Jo 



In step (i) we use the construction of f(z) along with the 
relation between D and \D\ domains given by ( |29l . We 
defined g(x, y) = tanh(tanri _ (x)+tanh _ (y)) = 
and step (ii) follows by explicitly writing the variable 
node convolution in the £>-domain. In step (iii) we 
defined 

K x ,y) = \ Yl f(g(ix,jy))(l + ix)(l+jy). 

te{±i}je{±x} 

To obtain this equivalent formulation of the integral in 
step (iii) we make use of the symmetry conditions of D- 
densities and the implied relationship between D and \D\ 
densities for y £ [0,1], 



(29) 



We claim that h(x,y) is Lip(2)[0, 1] (as a function x). 
This will settle the proof of the lemma. Notice that 



h(x,y) is a linear combination of four functions. Let us 
consider a generic term. Writing <?(•, ) explicitly, we have 

\f(9(ix,jy))(l+ix)-f(g(iz,jy)){l+iz)\(l+jy) 



■|/(^^)(l+«)-/(^7?7.)(l+«)l(l+i») 



1+ijzy 



(i) 



1+ijzy 

(i-y 2 ) 



<(l + -)(l+^) ri + ^ )(1 + ^ y) 

+ 0- + jy)\(ix - iz )\ 



\(ix — iz)\ 



In (i) we use the Lipschitz continuity of /(■) and i 2 = 
j 2 = 1 to obtain the first term. We use |/(-)| < 1 to obtain 
the second term in (i). Indeed, since / is Lip(l)[0, 1] and 
f(0) = we must have |/(x)| = |/(|x|)| = |/(|x|) - 
/(0)| < |x| < 1. Also, in the above expression, we can 
replace \(ix — iz)\ by |x — z\. 

Now we sum over all possible i,j and divide by 4 to get 

\h(x,y)-h(z,y)\<j\x-z\x( ]T (l+jy) 

;e{±i}je{±i} 

+ V (l + ix)(l+jy)-, ^ ~J '- 

^ (l + ijxy)(l + ijzy) 

ie{±i},je{±i} 

Since J2je{±i} 3V = we have 
ie{±i},je{±i} 

Let us consider the other term. We split the sum into two 
parts, one sum over ij > and the other over ij < 0. 
We have 



^(l + ix)(l+jy)- 



ij<0 



_ = 2 - . 

(1 + ijxy)(l + ijzy) (1 - zy) 



E(l + ix)(l+jy)- 



(i-y 2 ) 



= 2 



1-y 2 



tj>0 



( 1 + ijxy) {1 + ijzy) (1 + zy) 



Adding the two we get the total contribution 

-y 2 

zy 1- zy 
Putting everything together we get 



2(1 - y 2 ) + ) = 4— 

v y '\1 + zy 1 - zv) 1 



Z 2y2 



< 4. 



\h(x,y)-h(z,y)\ < 2\x-z\. 

To get a good bound on d(a® 1 ® c, b® 1 © c) in terms of 
d(a, b) for i > 2 consider 



1 * 



©i-j 



and note that the Wasserstein metric can be expressed 
directly in the L-domain as 



d(a,b) = 



(a(y) - b(y))dy 



2e~ 



(1 + e- x y 



dx 
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Applying this representation we observe that 



1 



d(a © c © c', b © c © c') = -d( 

i 

which yields 



) c, b 



c) 



d(a m ®c, b® 4 ©c) < 2id(a,b). 

) Regularity wrt ffl: Let f(x) be Lip(l)[0, 1]. Let = all c 
and e = b ffl c be the D-domain representation. 



f(z)(\9\(z)-\t\(z))dz 



(a) 



(|a|( a ;)|c|(2/)-|b|(x)| C |(2/))/( a : 2 /)dxdy 



o Jo 
l 



< / dyc(y) / /(xy)(|a|( 2 ;)-|b|(x))dx, 
Jo Jo 

where step (a) follows since in the \D\ -domain, check- 
node convolution corresponds to a multiplication of the 
values. 

But note that if f(x) is Lip(l)[0,l] then f(xy) is 
Lip(|y|)[0,l]. Hence, 

d(a ffl c, b i c) < d(a, b) / dy\c\(y)y 
Jo 



«(c) = /o 



'-\c\(v)dy 



d(a,b)(l - 2 2(c)) 

B(c)<2 v /e(c)(l-£(c)) , 

< d(a,btyl-!8 a (c). 

Above, the relation between the Battacharyya and error 
parameters can be obtained via extremes of information 
combining (see l62l ). Let us focus on the last part. To 
get a good bound on d(a mt , b ffll ) in terms of d(a, b) for 
i > 2, consider 



1 * 

c= - ^o^^fflb^- 1 , 



3=1 

and note that the Wasserstein metric can be expressed 
directly in the D-domain as 

»i 



d(a,b) 



(a(y) - b(y))dy 



dx 



Applying this representation, we observe that 

d(a fflc,bfflc) = -d(a mi ,b Bi ). 
i 

This yields 

d{a m ,b m ) <id(o,b)(l-2C(c)) 



d(o,b)^(l-2£(a ffll -- ? ' fflb^" 1 )) 



3=1 



d(a, b) £>-2 e(o))*- J '(l-2 S(b))^ 1 



3=1 



<d(a,b)]T(l-S 2 (a))^(l-<8 2 (b))^. 

3=1 

) Regularity wrt DE: Follows from properties ([vi]> and dviib . 



(ix) Wasserstein Bounds Battacharyya and Entropy: Let g be 
a positive function on [0, 1] and let / be a C 2 concave 
decreasing function on [0, 1]. Then, for any c > |g|oo, 

f(x)g(x)dx<c(f(l-- f g(z)dz)-f(l) 

Before proving the inequality let us use it to establish the 
stated bounds. Set g(z) = ||<8|(z)-|2l|(z)|. Then < 
1 and J Q g{z)dz = d(o, b). Now, for the Battacharyya 
bound let f(z) = yl — z 2 and note 



33(b) -33(a) 



f(z)(b(z)-a(z))dz 



f(z)(\<B\(z)-\Ql\(z))dz 



- I f{z)g{z)dz. 

10 

We obtain 

1 58(b) - ®(o)| < y/l-(l-d(a,b)) 2 



= v / rf(a,b) v / 2-d(a,b). 

For the entropy case we set f(z) = h 2 {^ 2 -)- The same 
argument as above yields 

|H(b)-H(a)|<to(%^) 
1 



< ^y/d(a,b)y/2-d(a,b)- 
We prove the stated inequality. Let us define 

where c > \g\oo. For each z G [0, 1] we have f Q (g(z) 
g(z))dz > with equality at z = 1. Hence, 

/"(*)( [' (g(x) - g(x))dx)dz 



f'(z)(g(z)-g(z))dz. 



This yields 

/'(Z)ff(«)d8 < 



/'(*)s(*))d* 



= c(/(l-- / g{x)dx) - /(l)) . 
c Jo 

(x) Battacharyya Sometimes Bounds Wasserstein: Since the 
cumulative | £> | -distribution of Ao is equal to 1 on [0,1], 
the maximum possible value, we have 

d(Q,A ) = / (l-|3l|(*))dz 



= 1 - 2<£(a) < V 1 - 23(a) 2 . (30) 

Similarly, since the cumulative \D\ -distribution of Ai is 

on [0, 1), we have 

d(a,Ai)=/ \&\{z)dz = 2(£(a) < 05(a) . (31) 
Jo 
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Appendix D 

Wasserstein Metric and Degradation - Lemma[T"41 
Proof. 

(i) Wasserstein versus Degradation: Let / be a function of 
bounded total variation on [0,1]. (This implies that / 
has left and right limits.) Note that we include |/(0— )| 
and |/(1+) | in the definition of total variation, which we 
denote by ^ |/'(x)|dx. Define F(x) = f* f(z)dz. We 
claim that if F > then 

f\(x)dx) (£ \f(x)\dx) > l^j 1 \f{x)\Ax) 2 

This claim implies statement dj) by setting f(z) = 
(|5S|(1 - z) - |2l|(l - z)) and noting that, in this case, 
J 1 |/'(z)|dz<2. 

We now prove the claim. Let S be the set of points x 
in [0, 1], including the endpoints, where f(x—)f(x+) < 
0. Note that S is closed and we may assume / = 
on S. The complement of S is a collection of disjoint 
open intervals such that / is either strictly positive or 
strictly negative in each interval. Consider the subset of 
intervals on which / is strictly negative. Without loss 
of generality we may take this collection to be finite. 
Indeed, suppose there are countably infinitely many such 
intervals Ji, J2, ... Define an approximation fa by setting 
fk(x) = -f(x) for x G up fc+1 J !; and fa(x) = f(x) 
otherwise. Then Fk(x) — J* fa(z)dz > F{x) > and 
F k -> F uniformly. Furthermore, J Q \fa(x)\ = J \f(x)\ 
and J \.f' k (x)\ converges to J from below. 

By taking unions of intervals as necessary we can find an 
increasing sequence = x\, X2, X2k, %2k+i = 1 such 
that on F = [xi,Xi+i] we have / > for i odd and 
/ < for i even. The sequence of points Xi is strictly 
increasing except possibly for the last pair which may 
coincide at 1. Define 

hi = max I f(x) \ , 

w% = I J f(x)dx\/h l = J \f(x)\6x/hi , 
where w; = if hi = 0. Note that Wi < \F\. We have 

2fe „l 2k 



„1 2k -i 2k 

/ |/'(x)|dx>2V^ / \f(x)\dx = J2h l 
J o i=i J o j=l 



We claim in addition that 

r l 2k 



2 [ F(x)dx > V h, 
Jo i=i 



W; 



The desired result then follows from Jensen's inequality 



2k u ,,a 



EAR j 
i=i h 



E2k , 
i=l h * U 



> 



Now, note that 



F(x) = [ (l-x)f(x)dx. 



It is straightforward to show that for odd i we have 

(1 - x)f(x)dx > -((x i+1 + Wi) 2 - xl +1 )hi 

and for even i we have 

(1 - x)f{x)dx > --{x 2 - (Xi - Wi) 2 )hi 
h z 

where x = 1 — x. Indeed, for odd i we have J (f(x) — 
^l{r>.T I+1 - t « I })da; > for all z G [2^,2^+1] with 
equality at z = Xi+\. Hence 

x i+ i 

(1 - x )(f( x ) " fkl{x>x i+1 -wi})te 

( / (f(x) - hil {x > Xi+1 _ Wi} )dx\dz > 0, 
which gives 

+ \l-x)f(x)dx 

> / (1 - x)hil {x > x . +1 _ w .y)dx 

J Xi 

The argument for even i is similar. We obtain 

2 j (l-x)f(x)dx> 

h 2 i-iwl i _ 1 + h 2i w 2 2i + 2(fi2i-iW2i-i - h 2 iW2i)x 2 i 

Defining x 2 k+ 2 = for notational convenience, we can 
write 

2k 



2 f (l-x)f(x)dx-Y / h i w 2 
Jo i=i 

k 

> 2^(/i 2 i-iw 2l -i - h 2l w 2i )x 2l 



i=l 

= 2 51 ^2( h 2j-lW2j-l - h 2j W 2j ) J (x 2 i ~ X 2{l+1) ) 
i=l \j=l / 
k 

= 2^F(x2i+i)(x 2 i -x 2 (i+i)) > 0, 

i=l 

and the proof is complete, 
(ii) Entropy and Battacharyya Bound Wasserstein: Let us first 
focus on the inequality between the Wasserstein distance 
and the Battacharyya parameter. From point (i) we know 
that 



d(a,b)<2j£z(|<8|-|2l|)d; 



1 , /•! 



= 2^^ (y {\x\[p)-\vt\[p))tev 

By integrating by parts twice we have 



03(a) = f y/l- z 2 \a\(z)dz 
Jo 
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(1-z 2 )" 



%\(x)dx)dz, (32) 



and 



H(o) 



1-2 



|a|(,z)dz 



|2l|(a:)da;)&s 



o 

Thus we obtain 

ri 

z(|«8|-|a|)dz < (ln2)(H(b)-H(a)) < 03(b)-05(a) 

D 

This yields 



d(a, b) < 2V(ln2)(H(b) - H(a)) < 2 ^B(b) - 03(a). 
For the final inequality first note that g(z) = (1 — 
^ 2 )" 1 (/ Z 1 (I ( B|(^) - l»l (»))<&) < 1. Let « = 
/o g(z)dz = (ln2)(H(b) - H(a)) . It follows that 

05(b) - 05(a) = J -j=L=g{z)dz 

zdz = arccos(l — v) 



< 



< -yfr = -v/ln2(H(b)-H(a)) 

< v/2(H(b)-H(a)). 

(iii) Continuity for Ordered Families: Assume that a -< b. 
From point <fTT]> we know that 



d(a,b) <2v/03(b)- 03(a), 

and the continuity follows from the continuity of the 
Battacharyya parameter for smooth channel families. 

Appendix E 

Sufficient Condition for Continuity - Lemma ITTl 
Continuity for Large Entropies - Lemma [T"8l 
Universal Bound on Continuity Region - 
Lemma [T9l 

Lemma 50 (Bound on 03 J: Consider two L-densities ai -< 
a 2 . Then, for any degree distribution p(-), 

(®(p(a a )) - B(p(ax))) < (03(a 2 )-03(a 1 )y(l - 03 2 ( ai )). 

Proof: Let a be a density and let {/ be distributed 
according to the corresponding \D\ -distribution. By Jensen's 
inequality we have 

03(a) = E[(l - U 2 )?] < (E[l - U 2 ])i = (1 - m aA )i , 

where we have introduced the notation m a ^ = E[[/ 2fc ]. The 
Taylor expansion (1 — u 2 )^ = 1 — Y^k=\ a kU 2k gives 



33(a) = 1 - ^2,OL k m 3 ^ 



k=l 

where otk is positive for each k. The functionals m a< k nave the 
important (Fourier) property m p r a \k = p(m a _k) lRj2l Fl Since 

5 We introduced here only the even moments, since only these are needed. 
The odd moments are multiplicative as well. 



u k is convex and increasing for k > 1, we have m 3l k > 
m a2jfe . Hence, 

?8(p(a a )) - ®(p(ax)) 

OO 

= ^ a fc (p(m aij fc) - p(m 32ik )) 
fe=i 



< y^afc/3 / (m ai) fc)(m ail fc - m a2ik ) 
fe=i 

OO 

< P / (^ai,l)(y^Q'fc(^ai,fc - »^a 2 ,fe)) 

oo 

< p'(l - 03 2 ( ai ))(^a fc (m ai , fc - m,,,*)) 



k=l 



= p'(l- 03 2 (a 1 ))(03(a 2 )-03(a 1 )). 

■ 

Lemma 51 (Bound on Derivative of^B): Consider two L- 
densities ai -< a 2 . Let < hi < h 2 < 1 and let c hl and 
Ch 2 denote the two corresponding channels from an ordered 
family {ch}. Set — 03(ch;} for i = 1,2. Then, for any dd 
pair (A, p) 



05(T hl (a 1 ))-05(T h2 (a 2 ))| < 

a|03( ai )-03(a 2 ) 



— B] 



h 2 \ 



where a = B hl X'(l)p'(l - 03 2 (ai)). 

Proof: First, since 03(a © b) = 03(a) 03(b), 03(T h (a)) = 
J B h A(03(p(a))). Second, since < \{x) < 1 and X'(x) < 
A'(l), |A(x!)-A(a; 2 )| < A'(l)|a:i -z 2 | for all a*, a* £ [0,1]. 
This implies that | 05(T hl (a 1 ))-05(Tn 1 (a 2 ))| is upper bounded 
by A'(l)-B ll |05(p(ai)) - 0S(p(a 2 ))|. Using the triangle in- 
equality, we get 

|03(T hl ( ai ))-OS(T h2 (a 2 ))| 

< |03(r hl (a 1 ))-03(T hl (a 2 ))| + |03(r hl (a 2 ))-03(r h2 (a 2 ))| 

< A'(l)S 111 |03(p(a 1 ))-53(p(a 2 ))| + | J B hl -BJ. (33) 

The first term above can be bounded using Lemma l50l ■ 
Proof of Lemma\T7\ Denote by x h the BP FP for the channel 
Ch and notice that any other FP x h for the same channel 
is necessarily upgraded with respect to x h , i.e., x h -< x h . 
Indeed, x h -< Ao. By applying the density evolution operator, 
we deduce that x h -< x£ , where x£ is the density after £ 
iterations of BP. By taking the limit £ — > oo we get x h -< x h . 
We conclude that if x h does not satisfy (0 then neither can 
any other FP for the same channel. 

Assume on the other hand that x h satisfies (0 and that there 
exists a distinct FP for the same channel, necessarily upgraded 
with respect to x h , also satisfying (|9). Call this density x h . In 
this case, 

| 03(x h ) - QS(x h )| ^ eFPs | 03(T h (x h )) - 03(T h (x h ))| 

Lemma [57] 

< (l-,5)|03(x h )-03(x h )|, 

a contradiction since S > 0. The above argument shows that 
there can be at most one FP with this property and that this 
FP must be the forward DE one. 
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Let us now prove Lipschitz continuity, c.f. ( fTQb . Under our 
hypotheses, the two FPs x hl and x h2 are the BP FPs for 
channels c hl and c h2 . Consider therefore the respective BP 
sequences (starting with Ao) {x^}i>o> { x h^}^>o- F° r eacn ^> 
xj^ (respectively ) is degraded with respect to x hl (respec- 
tively x h2 ), and therefore satisfies the condition (0, since the 
latter does. Furthermore, assuming without loss of generality 
h 2 > h l5 we have xg> >- x<f . Let 4 1 5B(x£?) - «8(x£>)|. 
Since DE is initialized with Ao, we have <5o = 0. By applying 
LemmalBTlwe get 6g+i < (1—6) 6(+\B hl —B h2 1, and therefore 

St < (1 + (1- 6) + (1-6) 2 + ■■■ + (!- 6) e - 1 )\B hl -B h2 \ 

l-(l-tf)' 



< 



1-(1-*) 

The thesis follows by taking the £ — > oo limit. 
Proof of Lemma [751 For /3 G [0, 1] define 



-Bin — -Bh 2 







(1- (l-^ 2 )^-!)-^ 



(34) 



Note that g(l) = 1 and that g(/3) is continuous. 

Assume that we run forward DE with the channel c and 
that 03(c) = g(/3), for some f3 G [0, 1], We then claim that 
for the resulting FP x, 03 (x) > /3. To see this, let {x^'} 
denote the sequence of densities with x^ ' = Ao. Using 
the Battacharyya functional on the DE equations and then 
extremes of information combining bounds we see that 



03(x w ) > 03(c) (l - (1 - 03(x (£ - 1) ) 2 ) d "- 1 
Note that if 03 (x^ 1 )) > /3 then 

03(x^) > 03(c)(l - (1 - Q5( x (^-D)2)dr-i 

d l -1 

>g(P)(l-(l-p 2 ) d ^)~^~ = /3 



The induction is anchored by noting that 1 = 03 (Ao) > (3 
since we assumed that j3 G [0,1]. In summary, for each j3 G 
(0, 1], equation (f34-b gives us the lower bound 03(x) > j3 for the 
FP x of forward DE with the channel 03(c) = g(j3). Another 
way of interpreting ( f34b is that it gives us an upper bound on 
03(c) if we fix 03 (x) = 0. 

According to Lemma [T7J the GEXIT curve is Lipschitz 
continuous (in the Battacharyya parameter) at the FP (c h ,x h ) 
if 



03 



(x h ) > ^-(BfoXdj-lXdr-l))"*^ 3 



(35) 



Note that ( 1341 as well as d35l l (if we interpret the inequality 
as an equality) give rise to curves in the (03(c), 03 (x)) space. 
Inserting ( 1341 ) into ( 1331 ) gives us the points where these two 
curves cross. If we set y/x = 03(xh), massage the resulting 
expression, and set it to 0, we get ((lit . As shown in the 
subsequent Lemma [52] ( fTTT > has a unique positive solution 
in (0, 1] (i.e., the two curves only cross once), b(x) < a(x) 
after this solution, and g(/3) is an increasing function above 
this solution. The situation is shown in Figure [8] 

Inserting this solution back into d34l l gives us a value 
of 03 (c h ) so that for all channels with larger Battacharyya 




0.2 0.4 Q3(c 



Fig. 8. Consider the (3, 6)-regular ensembles. The C-shaped curve on the 
right is {34). This curve has two branches. The top branch gives a tighter 
bound and pairs (95(c), 23 ( x)) g enerated by DE must lie above this branch. 
The second curve, given by i35\ , denotes the region (above the curve) where 
there can be at most one FP. The GEXIT curve for the BEC is shown as a 
dashed curve. The portion of this GEXIT curve starting at (1, 1) which is 
contained in the gray area is guaranteed to be smooth. 



constant the densities generated by forward DE are non-trivial 
and are Lipschitz continuous. This insertion is equivalent to 
evaluating c(x) at x = x. 

Let us finish the proof by showing that 03 (x h ) > a; u (l) for 
all h > h. Indeed, from the extremes of information combining 
we have 

23(x h ) < (1 - (1 - B^- 1 ))*- 1 , 

where above we have replaced 03 (ch) < 1. Above inequality 
implies that either 03 (x h ) = or 03 (x h ) G [a; u (l)i !]■ From the 
above discussion we know that for h > h the densities gen- 
erated by forward DE are non-trivial. Putting things together 
we conclude that 03 (x h ) > x a (l). 

Lemma 52 (Unique Zero): For d r > di > 3 let 

a(x) = (l-(l-x) d -- 1 ) dl -\ 



b(x) = (di - l) 2 (d r - l) 2 x(l -x) 



2{d T 



c(x) = yj x/a(x). 

Then there is a unique solution of a(x) = b(x) in the interval 
(0, 1], call it x. Further, c(x) is increasing for x G [x, 1]. 

Proof: Set L = di — 1 and R = d r — 1, multiply the 



equation by l/L and set y = (1 



This gives the 



equivalent equation A(y) = B(y), where A(y) = (l—y) L /L 2 , 
and B(y) = R 2 (y 2 ~n - y 2_ «). 

The function A(y) is (i) decreasing and convex for L > 2, 
(ii) A(0) = l/L 2 > 0, (iii) A(l) = 0. The function B(y) is 
(i) increasing for y G [0, j/i = ( f§Ef ) R ], (ii) decreasing for 

y G [2/1,1], (iii) concave for y G [y 2 = ( ^fi-i)(B-i) ) H > 1 1> 
and (iv) B(0) = B(l) = 0. Note that < y 2 < yi since we 
assumed that R > 2. 

We conclude that in the region [0, y±] there is exactly one 
solution, call it y: there is at least one since l/L 2 = A(0) > 

R>L>2 

B(0) = 0, whereas A(y x ) < l/L 2 < R/8 < i?2" 3+ « < 
R 2 2- 2+ i(2i - 1) = B(\) < B(yi) (since y x is the position 
where B(y) is maximized); and there is only one solution 
since in [0, yi], A(y) is strictly decreasing, whereas B(y) is 
increasing. 

In the region, y G [y\ , 1) there can be no further solution 
since A(y x ) < B(y 1 ), A(l) = B(l) = 0, and A(y) is convex 
whereas B(y) is concave. 
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Note that b(x) starts at 0, then increases until it reaches 
its maximum, and then decreases back to 0, which it reaches 
at x = 1. Let x be the largest value within [0, 1] so that 
b(x) = 1 (we will verify shortly that this is well defined). 
Since b(x) = 1 but a(x) < 1 for all x G [0, 1], it is clear that 
x < x. Note that x is obtained from y. Recall that we want 
to show that c(x) is increasing for x G [x, 1]. We will show 
the stronger statement that c(x) is increasing for x G [x, 1]. 
This is equivalent to showing that x/a(x) is increasing in this 
range. Note that (x/a(x))' = p(x)q(x), where 

q(x) = 1 - (1 - x) d ^- 2 {{did r -di- d r )x + 1), 



as y d -~ 1 {did r -d x - d r ) - y dr ~ 2 ((did r - di - d r + 1) + 1), 
where y = 1 — x. This polynomial has two sign changes and 
hence by Descarte's rule of signs at most two positive roots. It 
follows that q(x) has at most 2 roots for x < 1. Since q(0) = 
and q(l) = 1, there must be exactly one root of q(x) in (0, 1] 
and once the function is positive, it stays so within [0, 1]. It 
therefore suffices to prove that q(x) > 0. By definition of x 
we have (1 — x) dr ~ 2 = — — tttt — 77—^- We therefore have 



(dz-l)(d r -l)Vx 



q(x) = r(z) I z=& , where 

(did r -di-d r )^ 



*(z) = !-■ 



1 



(d;-l)(d r -l) (d z -l)(d r -l)Vz 



A quick check shows that r(z) > for z G [ -di-d ) 2 ' ■'■]• 
The proof will be complete if we can show that x G 
1 _ d p , 1]. We do this in two steps. We claim that 



L(d,d r -d,-d,) 

. > „ _ clnV(d,-l)(d r -l) 



that x G [-, 



d r -2 

1 



, where c 



In ^(Ji-lKdr-D 



, and 



To see the first, 



.11. The second claim is immediate. 



b(x) > (di - 1) 2 (dr - l)c In ^/(dj - 1) (d r _ l)e 2 ( d -- 2 ) ln (!- £ ) 



> (d, - l) 2 (d r - l)c In ^/(dz - 1) (d r - l)e *=» 



(d t -l)(d r -2)ln v /(dt-l)(d r -l) 
d r - 2 + In ^/(d/ - l)(d r - 1) 



> 1 = b(x). 



This shows that that the maximum of b(x) in [0, 1] is above 
1 and so x is well defined. Since further, b(x) is a unimodal 
function and x was defined to be the largest value of x € [0, 1] 
so that b(x) = 1 it follows that £ > x, as claimed. ■ 
Proof of Lemma \19[ Let a(x),b(x) and c(x) be as defined 
in Lemma [18] We will provide an upper bound on the unique 
solution of a(x) = b(x). Notice that a(x) represents the DE 
equations for a BEC with parameter e = 1. Therefore, we 
know that for x > x a (l), a(x) > x. We claim that b(x) and 
l(x) = x intersect only at one point in (0, 1]. Indeed b(x) = x, 
x G (0, 1], is equivalent to 

X = l-((d l -l)(d r -l))-^ =X. 

Since 6(1) = 0, whereas i(l) = 1, we conclude that for x G 
[x, 1], b(x) < x. 

We further claim that 3; > x u (l). Let us assume this for a 
moment. Then we have a(x) > x > b(x) for x G \x, 1]. We 
conclude that the unique solution of a(x) = b(x) in (0, 1] is 
upper bounded by x. 



We finish the lemma by proving x > x u (l). Indeed, since 
i^0, all we need to show is that (1 - (1 > ^ 
For 3 = d r = di one can verify the validity of the claim 
directly. In general, we have 

(1 - (1 - x) d "- 1 ) d *- 1 > (1 - (1 - x)^- 2 )^- 1 



where 



> 1 - 
the 



1 



and p(x) > for x G [0, 1]. The factor q(x) can be written ( 



last 

1 d r >4 

d r -2 > 



(di - l)(d r - 1) 
1 

(dj - l)(d r - 1) 
inequality 



> 1 - 



d r — 1 



follows 



since 



(d,-l)(d r -l) ) ' — dr-X' 

The Battacharyya parameter of the channel is thus upper 
bounded by y'x/aijE). Using the upper bound on the entropy 
in Lemma @] we get the claimed bound. 

It remains to show that this bound converges to when we 
fix the rate and let the dds tend to infinity. To simplify our 
notation, let L = di — 1 and R = d r — 1. We have 



li = \/x/a(x) = y^I 



1 - (Li?)" 7 ^ ) (l - (LR) —1 



(a) ! / 1 ! / la(RL) 

< Y 1 - (LR) R - 1 =ei\Jl-e 



< e 4 V 1 - e v^h^t < 



e 1 



ls/2 



(d r -2)i 

where (a) is obtained by using the following sequence of 
inequalities, 



l-(LR)-T^) = \J e -LHi-(LR) 



") 



Taylor Expansion 
forln(l— x) \ 

< V 



_ R 

L(LR) ^-1 
Z R 

R-l < 



6 1-(LR) 



_l d r >4 1 

<g2(d r -2) < g4. 



We finish the proof by showing that h < h. and 03 (x h ) > 
x u (l) for h > h. Let us first show that h. < h. Note that 
h = hBMs(c(a;)), where recall that hBMs(-) is the function 
which maps the Battacharyya constant of an element of the 
family to the corresponding entropy. Thus we have h < c(x). 
The proof is now complete by observing that c(x) < c(x), 
due to the monotonicity of the function c(x) for x > x, as 
shown in Lemma l52l ■ 

Appendix F 
Entropy Product Inequality - Lemma [2T1 

By definition, we have 

H(a®b)=/ f \a\(x)\b\(y)k(x,y)dxdy, 
Jo Jo 

with the kernel as given in the statement. Differentiating, we 
have 



k v (x,y) 



1 f 1 + y 1 + xy 

in a; In 



21n(2)V 1-y 1-xyJ' 



6 Recall that for the BEC(l), the DE equation is given byx=(l — (1 — 
x) dr ~ 1 ) d i~ 1 . Furthermore, there are 3 FPs namely, 0, x u (l) (unstable) and 
1 (stable). Finally, we have that (1 — (1 — x ) dr ~ 1 ) d i~ 1 > x if and only if 
x = or x S [x u (l), 1]. See Chapter 3 in 1621 for more details. 
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kyy{x, y) — 



1 



k'r 



ln(2)(l-j/2)(i 
2 1 + 3x 2 y 2 



x 2 y 2 ) 



x 2 y 2 ) 3 



ln(2) (1 

Integrating by parts twice for each dimension, we see that 



H(a © b) = f 
Jo 



JO 



\a\(x)\b\(y)k(x,y)dxdy 



m(x)\ f B\(y)k xxyy (x,y)dxdy. 



This proves the alternative representation of this integral. 

Note that the bound (Q) is implied by (1 _, x \ 2 \3 < (1 — 
3 3 \ x y ) 

x 2 )~2 (l - y 2 )-^ . Let u = (1 - a; 2 ) -1 and t> = (1 - y 2 ) -1 . 
Then the desired inequality is equivalent to ^/u+i /l-i/ uv )3 < 
uiyi for tt, u > 1. Raising both sides to the power of | this 



becomes 



< uv . Multiplying both sides by 
this can be written as uv < (v + u — l) 2 which is 



(1/u+l/v-l/uv) 2 



1 , proving the 

- y 2 )~i 



equivalent to < (v — l) 2 + [u — 1) 
claim. 

This bound k xxyv (x,y) < 
immediately gives rise to the claim ( Imt : the right-hand side 
factorizes and, excluding the constant 8/ln(2), each factor is 
just the Battacharyya kernel in this representation ((1 — a; 2 ) - ' 

the Battacharyya kernel 



is the second derivative of \/l — 
in the |D|-domain, cf. d32ii). Note that we can use the 
upper bound on k xxyy (x,y) to obtain diTTb since by (fJJ, the 
differences (\<B' \(y) - \<8\(y)) and (|5l'|(x) - |2t|(x)) are non- 
negative. 

It remains to prove the claim ©. We claim that if d(b' , b) < 
S then ||Q5'|(y) — |Q3|(y)| < min{<5, 1—y}. The second bound is 
immediate since < |Q5|(y)| < 1 so that ||<8'|(y)- 

|*8|(y)| < $ v dy = 1 — y. To see that the difference is less 

Idz < 



than S we have < L ||«8'|(z)-|58| 



; 1 H®'l(z)-l®|(z)|dz 

H((a'-a)©(b'-b)) 
/ / ||2t'|(x) - |a|(s 



0, Lemma 1131 



d(b' , b). We now have 



< 



< 



ln(2) 
8 

M2) 



<B>\(y) -\<B\(y)\k xxyy (x,y)dxdy 

58(a' - a) / min{(5, 1 - y}(l - y 2 )-3dy 
Jo 



<8(a'-a)V2(5, 



where to obtain the second inequality we combine the upper 
bound on k xxyy (x,y) derived above with the alternative rep- 
resentation of 25(a) as given in d32l . 



Appendix G 
Evaluation of GEXIT Integral 



Lemma [ 



For the proof of Lemma [26] it will be handy to have the 
following two lemmas available. 

Lemma 53 (Entropy of Single-Parity Check Code): 
Consider a single-parity check code of length d r . Let X 
denote a codeword, chosen uniformly at random from this 



code. Let Y denote the result of passing the codeword 
through a BMS channel with density x. Then 

H{X\Y) = d r H(x) -H(x ffld ")- 

Proof: Let X\, ...,Xd r be uniform random bits and let 
Z denote their parity. Suppose Xi is transmitted through the 
BMS channel with density x. Let the received vector be Y. 
The entropy of the single parity check code is H(X\Z = 

0, Y). By symmetry we have U(X\Z = 0,Y) = U(X\Z = 

1, Y) = H(X\Z,Y). Now H(X,Z\Y) = H(X\Y) + 

H(Z\X,Y) = H(X\Y) = Ef=i H ( x ). but we also have 
H{X,Z\Y) =U(Z\Y)+U(X\Z,Y) =H{x® d -)+H(X\Z,Y). 
Thus, the entropy of the single parity check code is 

H{X\Z, Y) = d r H(x) - H(x Hd ") . 

Now consider the channel that transmits a bit once through 
the channel with density a and again through a channel with 
density b. The entropy of the combined channel is H(a © b). 
This is equivalent to the single parity check code of two bits. 
Hence 

H(a © b) = H(a) + H(b) - H(a ffl b), 

which proves (the Duality Rule of) Lemma [6] ■ 
Lemma 54 (Entropy of Tree Code): Consider the (di,d r )- 
regular computation tree of height 2 (see e.g., Figure |9). This 
tree represents a code of length 1 + di(d r — 1) containing 
2i+di{d r ~2) coc [ eworc i s Let X be chosen uniformly at random 
from the set of codewords and let Y be the result of sending 
the components of X through independent BMS channels. The 
root node goes through the BMS channel c and all leaf nodes 
are passed through the BMS channel x. Then, 



H(X | Y) = H(x) + d t (d r - l)H(x) - H(x I* 
- (d, - l)H(x Hd '- 1 ), 



md r —i 



) 

(36) 



where x = c © (x^-i)©*-!. 

Proof: Using the chain rule, rewrite H(X | Y) as 

u(x\y)=h(x 1 \y)+h(x^ 1 \x 1 ,y^ 1 ), 

where X\ corresponds to the root variable node and X^i is the 
set of all the leaf nodes. The first term is computed by density 
evolution by considering all the independent messages flowing 
from the leaf nodes into the root node. Indeed, we convolve 
the channel density c with the densities coming from the di 
check nodes, each of which has density y = x Hdr_1 . Thus we 
get 



H(Jfi | Y) = H(c © f jdl ) *~ C ® Y = H(x © x^ 1 ) 



Lemma [6] 



H(x) + H(x Hd "- 1 ) - H(x S x^"- 1 ). 



Further, 



u(x^ \Xx = o, = n(x^ \x 1 = i, y^) 

= d ( [(d r -l)H(x)-H(x ffld "- 1 )]. 

Indeed, when we condition on the root node to take either 
or 1, we split the code into di codes, each of which is a 
single parity-check code of length d r — 1. Using the previous 
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Lemma [53] we obtain the above expressions. Combining the 
above statements proves the claimO ■ 

Remark 55: We stress that in Lemma [54] (c, x) need not 
form a FP pair. Thus x will be different from x, in general. 
We will use the above expression when x and x are "close" 
(in the Wasserstein sense), i.e., (c,x) forms an approximate 
FP pair. This will allow us to give an estimate of the entropy 
of the tree code. 

Proof of Lemma |26] Note first that the integral 
G(di, d r , {ch,x h }^) is well defined. This is true since we 
assumed that h > h. This implies that we are integrating over 
a continuous function (cf. Corollary l22t . Hence the integral 
exists. All that remains to be shown is that the value of this 
integral is indeed 1 — jf- — A, as claimed. 

To evaluate the integral we consider the code corresponding 
to the (di, d r )-regular computation tree of height 2 as in 
Lemmal54l Let X be chosen uniformly at random from the set 
of codewords and assume that the component corresponding 
to the root node is sent through the channel c h , whereas all 
components corresponding to the leaf nodes are sent through 
the channel x h . Let Y be the received word. Since {ch,x h } h 
is, by assumption, a FP family, the density flowing from any 



i+di(d r -i) pl 

£ 

i=1 



<9H(Xj | F(h)) <9hi(h) 



din 



dh 



dh. 



(37) 



GEXIT of leaf nodes 



The lhs evaluates to 



dh-H(X | Y) = U(X | y(l)) - H(X | Y(h*)) 



= ll + dk{dr - 1) - di 
H(x)(l + d;(d r -l))-H(x ffl ^ 



(d i -l)H(x ffld -- 1 )). 



The last inequality is obtained by using Lemma [54] for the two 
endpoints and recalling that we set x = x h » . 

Let us consider the leaf node contributions. By symmetry 
these contributions are all identical. If we focus on a single 
check node, then again due to symmetry, the GEXIT integrals 
of all leaf nodes is the same. But the sum of all the GEXIT 
integrals is equal to the change in entropy of a single-parity 
check code of length d r . Thus, using Lemma [53] we see that 
the integral of any single GEXIT integral is equal to 

j ((d r - 1) - (d r H(x) - H(x H ^))) . (38) 



check node into the root node is y h 



density seen by the variable node (excluding the observation of 
the variable node itself) is y® dl . Therefore, the GEXIT integral 
associated to the root of this tree code is the desired integral. 
We will evaluate this integral by first determining the sum of 
all the GEXIT integrals associated to this tree and then by 
subtracting from it the GEXIT integrals associated to the leaf 
nodes. 

In the sequel we will perform manipulations, such as writing 
a total derivative as the sum of its partial derivatives or writing 
a function as the integral of its derivative. In a first pass 
we will assume that all these operations are well defined. 
In a second step we will then see how to justify these steps 
by approximating the desired integrals by a series of simple 
integrals. 

Label the variable nodes of the tree with the set { 1 , . . . , 1 + 
di (d r — 1)} so that the root has label 1. Note that by assumption 
H(c h ) = h, so that the entropy of the first component of Y, 
call it hi, is h. The entropy of the remaining components, call 
them hj, i E {2, . . . , 1 + di(d r — 1)}, are all equal and take 
on the value H(x h ). So we imagine that all components are 
parameterized by h. 

From Definition [23] we have, 



and so the total Combining all these statements, we get 



G{dl , d r , {C h , Xh}*, ) = (l + d, {d r - 1) - dj) - 

H(x)(l + di{d r - 1)) - H(x Hd ") - (dj - l)H(x Hd '- 1 ) 

4(dr-l) 



(d r -l)-(d r H(x)-H(x H ^)) 



d r 



A. 



G(di,d r , {c h ,x h }^) = / 
A 



1 dU(X 1 \Y(h))dh 1 (h) 



dhi 



dh. 



Note that 



1 dn(Xi\Y(h) <9hi(h; 



i:)h 



dh- 



7 For completeness, although the exact marginal does not factor into the 
computation, note that there are 2 1 + d < ( dr-2 ) codewords in the code. Out of 
those, 2 d '( dr ~ 2 ^ have a in the root node. So the marginal of X\ = 0/1 
is one-half. 



It remains to justify the previous derivation. We proceed 
as follows. Instead of working with {c h ,x h }, we will work 
with a simpler family which is piece-wise linear and "close" 
to the original family. Because it is piece-wise linear, the 
operations are simple to justify. Because it is "close" to the 
original family, the result is "close" to what we want to show. 
By taking a sequence of such families which approximate the 
original family closer and closer, we obtain the desired result. 

Let us start by constructing a piece-wise linear family, call 
it {ch,x h }, which approximates the original family {c h ,x h }. 
Consider the channel family {c h } and sample it uniformly in 
h with a spacing of Ah. To be precise, pick the samples (from 
the original family) at iAh, for an appropriate range of integers 
i. By a suitable choice we can ensure that h* = iAh for some 
i € N. In general, h = 1 will not be of the form iAh. This 
means that the last sample is not lying on the lattice. But we 
can ensure that also for the last sample the "gap" (in entropy) is 
at most Ah. This is all that is needed for the proof. Hence, for 
notational convenience we will ignore this issue and assume 
that all samples have the form iAh. 

Construct from this set of samples a family by constructing 
a piece-wise linear interpolation, call the result {c h }. Note that 
since the entropy functional is linear, this construction leads 
to a family so that H(c h ) = h. Further, {c^} is ordered and 
piece-wise smooth. We claim that 



d(ch, c h ) = d(c h , aCiAh + aC( i+ i)Ah) < 2 x /ln(2)Ah 
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where i = [^J and a G [0, 1] is a suitable interpolation 
factor. In the last step we have made use of (jvj in Lemma Qj] 
the convexity property of the Wasserstein distance, and the 
fact that consecutive samples have an entropy difference of 
(at most) Ah. Further, since they are ordered, i.e., QAh -< 
c h -< C( i+1 )Ah> an entropy difference of at most Ah implies 
a Wasserstein distance of at most 2y / ln(2)Ah (cf. (ii) of 
Lemma FBI). 

To each Ch=iAh corresponds a FP x h , call it x^Ah- Take 
the collection {x^Ah}- Since this collection is ordered we can 
construct from it an ordered and piece-wise smooth family 
via a linear interpolation of consecutive samples in the same 
manner as we have done this for the channel family. We have 



in 



d(x(i + l)Ah,XiAh) < 2WS(x (4+1)Ah ) - < B(x l Ah) 



< 2Ji(Q3(c (8+1)Ah )- ( B(QAh) 




Step (i) follows from Lemma [14] property djjji. In step (ii) 
we made use of the fact that h* > h(di , d r , {c h }) , so that 
according to Lemma [TTl 5 > 1 — *8(ch*)(<ii — l)(d r — 1)(1 — 
23(x h * YY r ~ 2 > 0. In step (iii) we used once more Lemma [T4l 
property dn|i. Now consider the distance d(x h ,x h ). We have 

d{x h ,x h ) < ad(x h ,x iAh ) + ad(x h , x (i+1 ) Ah ) < J -(Ah) 3. 

The last inequality above follows from considering the same 
steps as before, since the densities are ordered and each of 
them are FPs at channels with entropy difference at most Ah. 
Recall that {ch, x^} is a FP family, hence we can write 

<d(x h ,x h )+d(c h ©((x h ) ffld -- 1 )® d! - 1 ,eh©((Xh) Hd "- 1 )® d! - 1 ) 

<d(x h ,x h ) + 2d(c h ,c h ) + 2d((x h ffld ''- 1 )® d - 1 ,(x h sd ''- 1 )® d '- 1 ) 



<4 x /ln(2)Ah+ (4(dj - l)(dr - 1) + l)W-(Ah)3. 

V o 

In words, {Qi, x h } h > h * forms an approximate FP family. 
Above, we have used properties (v) and (vi) of Lemma Q~3] 

Let us now apply the family {ch,x h } h > h * to the depth-2 
tree. More precisely, we consider the depth-2 tree code where 
the root node is passed through the channel c h and the leaves 
are passed through the channel x h . We claim that all GEXIT 
integrals are well defined and that their sum is indeed the 
difference of the entropies. Let us prove this claim in steps. 

The root integral has the form 



E 



(i+l)Ah dh 
H((c (i+ i)Ah - QAh) © Zh 

?Ah 



Ah' 



where x t = - Lsd H^lAh + ({&] - s) x L&JAh and 
z h = ((xh) ffldr 1 )® dl . If we expand out z h explicitly then we 
see that the segment from i to (i + 1) has the form J2 a (~Eh ~ 
LshJ) Ja C Tssl — aT ) fc " ^i.ct for some fixed densities b i a which 
are various convolutions of two consecutive densities Xj Ah an d 
x (i+i)Ah an d some strictly positive integers j a and k a . Set 



a = — [ ~Eh\ )' so tnat a 8 oes f rom to 1 in each segment. 
Then in each segment the integral has the form 

/ H((c (i+1) Ah - QAh) © V (J Ja (1 - cr) fcQ bi, Q )dcr 
Jo y a ' 

= E (J " + 1 - ) | H (( C (»+l)Ah - QAh) © bi,„). 

So the root integral is in fact well defined. The same argument 
can be repeated for the leaf integrals to show that they are also 
well defined. 

If we consider one segment and add all the contributions 
(which as we saw can be written down explicitly) we can 
verify that the sum of all the GEXIT integrals is indeed equal 
to the difference of the entropy of the tree. This calculation is 
in principle straightforward but somewhat tedious, so we skip 
the details. 

If {ch, Xh} were a true FP family then the GEXIT integral 
of the root node would be equal to 1 — 4 s - — A. This follows by 
the same steps which we used in our initial casual derivation: 
once we know that all integrals exist and add up to the total 
change in the entropy of the tree code, all that is needed to 
draw this conclusion is to observe that for a true FP family we 
can use a symmetry argument to compute the value of each 
leaf GEXIT integral. 

However {£h, x h } is only an approximate (in the Wasserstein 
distance) FP family. But we know that by making Ah suffi- 
ciently small, we can make the approximation arbitrarily good. 
It is intuitive that by taking a sequence of such approximations 
which converges to a true FP family the limiting value of the 
GEXIT integral of the root node should again be 1 - f- - A. 
Let us show this more precisely. 

We have already established that the sum of the individual 
GEXIT integrals is equal to the total change of the entropy 
of the tree code. This change only depends on the endpoints 
but not on the chosen path. In particular, the endpoints for 
{Ch,Xh}i=h« and {c h ,x h }^ =h , are the same. 

All is left is therefore to prove that each leaf GEXIT integral 
has a value which approaches ( f38l > when Ah approaches 0. We 
know that this would be true if all the messages entering check 
nodes were x h and so the GEXIT integral was f., H(^ © 
xjf^-^dh. But the actual GEXIT integral is H(f^©z h )dh, 
where z h is the density flowing from the "interior" of the tree 
into a leaf node. Let us now show that 



V v dh 

In fact, let us show that 



,dx h 



dx h 



H(-^©Zh)-H(-^ 



~md r 



~Bd r 



dh 



Ah^O 



0. 



dh 



dh 



"^Idh 



Ah->0 



0. 



Note that for any he [h*, 1] we have 



d(xf^\z h )=d(x« 



~\Sd r — 1 ~\*\d r 



;"h 



fflc i ®(xf d '-- 1 )® dl ~ 1 ) 



jviit .Lemma 1131 

< 



Using the same line of reasoning as in in the proof of Corol- 
lary [22] we see that therefore for each h, lirriAh-j-o © 
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nl dh 



1 )| = 0. Since the integrand is also In step (a) we have used the expansion of Lemma |49l where 



bounded, it follows by Lebesgue's dominated convergence 
theorem that also the integral of this quantity over h converges 
to when Ah is taken to 0. 

The only thing which remains to be done is to prove that the 
GEXIT integral of the root node when we use the linearized 
family converges to the true GEXIT integral when we let Ah 
tend to 0. We will do this in several steps by considering the 
chain of integrals 

(i) G(di,d r ,{c h ,x h }^), 

(ii) G{di,d r ,{c h ,x h }l„), 

(iii) G(rf/,rf r ,{c h ,x h }J,), 

(iv) G(di,d r ,{c h ,x h }l- t ), 

and by showing that the value of consecutive such integrals 
is arbitrarily close. Here, {x h } is a family which is piece- 
wise constant on each segment, taking on the value of its left 
boundary. 

First note that the integral in (i) is well defined, being the 
integral over a continuous function. That the integrals in (i) 
and (ii) are close follows by the same line of arguments as 
we just used above. The same idea applies to prove that the 
integrals (iii) and (iv) are close to each other. Finally, the value 
of (ii) and (iii) is in fact equal. This is true since {x) h } is in 
fact constant on each segment and {c h } agrees with {£h} at 
the endpoints of the segments. 



Appendix H 
Negativity - Lemma 127] 

We prove Lemma [27] by showing the following slightly 
stronger statement. 

Lemma 56: Let x be an L-density and consider a degree- 
distribution (di,d r ) such that d r > 1 + 5(^3— ) 3 . Define 

h = [(!)^ + iW ^W,hH and I 2 = - 

^-4(^-1X^)1 _ ^ where K>Q 

(i) Assume that x is a ^-approximate FP, i.e., d(x, c © 
( x Hdr-i)®d ( -i) < ^ for some channel c and 5 < 



(I^.ThenifHMeA,^-^. 
(ii) For H(x) G I 2 , A < -k. 

Proof: Set y = x Hdr_1 . Let us first characterize the area 
A in a more convenient form. We have 

A = H(x) + (d, - 1 - di/d r )H(x md ^) - (d, - l)H(y) 



H(x) - |-H(y) + (di-l- di/d r ) (H(x ffl ^ 



-H(y)). 



For the i-distributions x and y let |y| and |rj| be the associated 
\D\ distributions. Following the lead of L. Boczkowski 111 121 
we write 



H(x) 



i 



— )dz = 1 



/ |f|(z) J]a n z 2n dz 
Jo 



f 1 

lTaJ |y|(z)z 2 "dz ( = ) l-'^a n m x . Il . (39) 
Jo 



n>l 



n>l 



"™ - 21n(2)«(2«-l) ' 71 - °' N ° te that a n > and mat 

J2n>i a n = 1- Most importantly, as mentioned in the proof 
of Lemma l50l the moments m x .„ are multiplicative under B. 
This implies that for d > 1, H(x ffld ) = 1 - E, t >i «" m x.rr 
E.g., for two distributions x and y we have 

1-H(xffly) = l-| J \f\( Zl )\t ) \(z 2 )h 2 (^-f^)dz 1 dz 2 
= // \z\(zi)\x)\(z 2 )^a n zf n Z2 n dz 1 dz 2 = ^a n m^ n m y , n , 

n>l n>l 

where in the first equality we use that in the \D\ -domain the 
check node operation is simply a multiplication. 

Assume at first that H(x) G [(|) , ^f- + gp^xp- ] and 
that x = c © y® d '~ 1 for some channel c. Define ip(x) = 
(1 - x)x dr - x . Then 



A = H(x) - ^-H(y) + (di - 1 - di/d r ) J2 a n ^(mx,n) 



<») d, (1 - 7r) dr 

< H(x) - ^H(y) + (di - 1 - di/d r y - ±+ 



(b) 



d r 



(c) 1 dl 

< 



1 



di - 1 - rf//rf r n _ J_ 
1 d; 



< H (x) - ^H(x)— + ; - 



d r — 1 
di 3 di 1 



2e d r ' 2(d r — l) 3 d r 4 ' d r e 8e c? r 
In (a) we used the bound ip(x) < 



d r -l 



so that 



(i_ 

£, n a n tp(mx,n) < jfzj ■ Consider step (b). Set H(y) 

Lemma 09] 

h 2 {p) > 4pp. Then 



H(x) = H^Sy 8 *" 1 ) < H(y® d,_1 ) < H(af*^) 

3 ( a fscrp)) = ( 4 ^^^ H 



Lem.f|] 



a BSC(pV 

di— 1 



In step (c) we substituted the upper and lower bounds on 
H(x) for the first and second expression respectively. Also, 
in the last inequality, we have 2 {dr-i) 3 — d (f — if) smce 
we assumed that d r > l + 5(d r /di)i > 1 + (2^-(f - |§ 
Let us summarize. If x = c © y ® d ' _1 and if H(x) € 

[(l)^.E^ + 2(^1 th ^ ^ < -ht- Let us dr °P 
the condition x = c © y® di 1 and assume instead that 
d(x, c © y ® di_1 ) < 5. Define x = c © y® d!_1 . Then 

A < H(x) - |-H(y) + B + (H(x) - H(x)) 

< H(x)-^H( y) + S + 3v^ < -l^ + 3V5 < 

a r 8e d r loe d r 

The one -before last step follows since if H(x) £ Zi then 

H (x) e [(f) + 2(5~rp ] and so we can a PP ! y ^ 

previous procedure. Also in the above computations we have 
used property ([IxJ of Lemma [T3l to bound |H(x) — H(x)| < 

For H(x) G f r - die- 4 ^- 1 ^^) 2 _ K ], 

A = H(x) - ^H(y) + (di - 1 - ^-) ^ 
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< H(x) - |-H(y) + (d, - 1 - X) <' 



<H(x)-J H (y) + ( ( i ; -l-^)^(l-2^ 1 (H(x))) 2 ^- 1 ) 



< te(p) - £(1 - e-^-D*) + (d,-£)(l - 2 P f^-V 



d r 

< -K. 



. K * +die -H*r-w£k)* 

d r 



In (a) we upper bound ip(x) = (1 — x)x r ~ 1 by x r ~ l , x G 
[0, 1], and note that m Xi „ G [0, 1]. In (b) we use m Xj „ < to Xj i 
(this is true since x 2n is decreasing for each fixed x G [0,1] 
as a function of n) and that x dr_1 is increasing. Step (c) 
is a consequence of the bound m x ,i < (1 — 2/i 2 ~ 1 (H(x))) 2 . 
Let us prove this inequality. Equivalently, we want to show 
H(x) < fe 2 ( 1 ~^ T )- By Jensen 



|y|(z)z 2 "dz > 
Using the above we have, 



\l\{z)z 2 Az 



1 



n>\ 



1 - ^mx,; 



The claim is proven by noticing that the lhs above is equal to 
H(x). 

Step (d) uses the following lower bound on H(y) = 
H(x H r_1 ). Set H(x) = h,2{p). From extremes of information 
combining we know that we get the lowest entropy if we 
assume that x is a BSC density. Therefore, 



H(y) > hi 



1 - (1 - 2w) d '-\ 03 
i ^ ) > 1-1 



= 1 - e 2(dr-l)ln(l-2p) > 1 _ 



-(l-2p) 

-4(d r -l)p 



2(d r -l) 



Consider finally step (e). We know that h%{p) G l<x. Combined 
with d26]l and (l^p) 2 ^ -1 ) < e -4(d r -i)p we conc i u d e that 

■ 

Appendix I 

Spacing of FPs -Lemma|57]and Transition Length 
of FPs - LemmaIoTI 

If we are given a proper one-side FP (with any boundary 
condition) then consecutive elements of the FP cannot be too 
different from each other. This is made precise in the following 
lemma. 

Lemma 57 (Spacing of FP): Let (c,x) be a proper one- 
sided FP on [— N, 0], N > with any boundary condition. 

(i) For i G [-N+ 1,0] 

,/ % di — 1 „ , , , . dr — 1 

d(*i,Xi-i) < - , Q3( Xi - B(xi_i) < . 

w w 

(ii) Let Xi denote the weighted average Xj = 
■h J2j,k=o^+j-k- Then, for any i G [-00,00], 

d(xi,Xi-i) < -, B(xj) - 5B(x<_i) < -. 

w w 



Discussion: Each of these two claims states that consecutive 
distributions are "close" either wrt the Wasserstein distance or 
the Battacharyya parameter. Further, the difference is either 
for the distributions themselves or their averages. 
Proof: 

(i) To simplify notation, for i G [— N + 1,0] fixed, let 

fj = Yjk=o x J+j-fc-i) * r ■ Writing the DE equa- 
tions explicitly, 



1 w 

x < = c ®(-E f 



, ©d;-l 

) x i-l = c 1 



3=1 



1 w—1 

3=0 



Note that the expressions for x, and Xj_i are similar. 
The only difference is that xj contains f w whereas x^_i 
contains Tq. Rewrite both expressions in the form 



Xj = c © 



w 



3 = 1 



©d,-l 



y-i-i = c ( 



1 w 
\w E 



®d t -l 



3 = 1 



where a.; = b^ = fj_i, i = 2,...,w, ai = f M) , and 
bi = fo. Now expand x, as well as Xj_i in the form 



E 



di,. . .,d w :di+.. .-Miti =dj— 1 



Xj-l 



E 



w - {dl -i) a i ® c ^,..,rf„ 

\d-L,...,d w / . ©di 



, . . .,d w :di-\-. ..-yi w —di — 1 



where c d2 ,...,d m = af d2 © ... © a® d ™ © c. Note that the 
terms in the expansions of x^ and x^_i with d\ = are 
identical. Therefore, if we consider fB(Xj) — 9S(Xj_i), 
these terms cancel. We can upper bound the difference 
by the Battacharyya constant of all those terms of the 
expansion of x, which correspond to d\ > 1, i.e., 



-(d 



<8(x 4 ) - B(xi_i) 

di>l,...,d w , 
s.t. dH hd TO =d(-l 

< „,"(*-!) X 

di>l,...,d„, 
s.t. diH hd ro =dj-l 



- 1 
e?i, . . . , d l 

di-1 
di,...,di 



»(af dl ©c d2 ,..., d J 



, d; — 1 

= 1 - (1 — ) d '- 1 < - — . 

w w 

If we are interested in the Wasserstein distance instead, 
we can proceed in an almost identical fashion. The only 
difference is that in the last sequence of inequalities 
we use the convexity property © and the boundedness 
property ([n} of (the Wasserstein metric) Lemma [T3l 
(ii) Using the convexity property (© of (the Wasserstein 
metric) Lemma [T3l and canceling common terms, we get 



^ w — 1 ^ w — 1 

d(xi,Xi_i) = d(— V x t+j _ fc ,— V 

j,k=0 

^ KJ— 1 W — 1 ^ 

= ~ rf (E Xi +j' E x «-i-3) < - 



X^-f-j— Jfe— 1 



j,fc=o 



3=0 



3=0 
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The proof for the Battacharyya parameter proceeds in 
an identical fashion and uses the linearity of the Bat- 
tacharyya parameter. 

■ 

Lemma 58 (Basic Bounds on FP): Let (c,x) be a proper 
one-sided FP on [— N, 0], N > with any boundary condition. 
Let *Bi = 93 (xj) denote the Battacharyya parameter of the 
density of the i-th section. Then for all i G [—N,0], 

w— 1 

93, < 93(c)(l -(!-—£ B^-fc)^- 1 )*" 1 . 

j,k=0 

Proof: For all i G [-N, 0] 

W — l -, w — 1 



>q =c© (- V(- Vx, 

3=0 fe=0 



©d;-l 



Since the Battacharyya parameter is multiplicative in © and 
linear, 



i 1 j i 

»M-»M(sE»((5E*«->r'- 1 )) 

j=0 fc=0 

Further, recall from Lemma [5] property (|iv]>, and the ensuing 
discussion, that 93(a sd ''~ 1 ) < 1 - (1 - 93(a)) dr " 1 , so that 

W — l _j KJ — 1 

» ((^ E ^) ffldr " x ) < i - (i - - E ^i-*)*"" 1 - 

fe=0 fe=0 

Combining, we get 



* w—1 ^ w—l 

93, < 93(c) (l - - J2 (1 - - E B «+i-Ji 



J=0 



Let /(a;) = (1 - x)^" 1 , x G [0,1]. Since /"(x) = 
(d r - l)(d r - 2)(1 - x) d -- 3 > 0, /(x) is convex. Let 
Vj = h Efe=o ^Si+i-fc- Then b y Jensen, 



1 W—l ^ 10— 1 

-£/(%•)>/(-£>) 



3=0 



3=0 



which proves the claim. ■ 

Lemma 59 (Basic Properties of h(x), 03]/): Consider the 

(di, d r )-regular ensemble with di > 3 and let e G (e BP , 1], 

where e BP (dz,d r ) is the BP threshold the regular ensemble 

when transmitting over the BEC. Define h(x) = e(l — (1 — 
x jdr-iy,-i _ x 

(i) For e > e BP , h(x) = has exactly three solutions, one 
of them being and the other two denoted by x u (e) and 
x s (e) with < x u (e) < x s (e). Further, h(x) < for all 
x G [0,x u (e)] and h{x) > for all x G [x u (e), x s (e)]. 

(ii) ft'(xu(e)) > and h'(x s (e)) < 0; |/i'(x)| < d t d r for 

x e [o, i]. 

(iii) There exists a unique value < x*(e) < x u (e) so that 
h'(x*(e)) — 0, and there exists a unique value x u (e) < 
x*(e) < x s (e) so that h'(x*(e)) = 0. Further, h(x) is 
decreasing in [0,x*(e)]. 

(iv) Let nJe) = min{-/i'(0), ~ fe(a: ; ( , e)) }. The quantity nJe) 
is non-negative and depends only on the channel param- 
eter e and the degrees (dz,d r ). 



(v) For < e < 1, ar»(e) > 

(vi) For < e < 1, K»(e) > 

(vii) Let and x* denote the universal lower bounds, given 
in the previous part, on /c*(e) and x*(e), respectively. If 
we draw a line from with slope — ft*, then h(x) lies 
below this line for x <G [0, x*]. 

(viii) For e G (e BP , 1] we have 



x u (e) > x u (l) > (dr - 1) d '- 



(40) 



Remark 60: The function /i(x) is the DE equation for the 
(dj, d r ) -regular ensemble when transmitting over the BEC. 
The two non-zero solutions, x u (e) and x s (e) represent the 
unstable and the stable FPs of DE l62l . In the following, we 
will be using extremes of information combining techniques 
to relate the Battacharyya parameters via h(x). 

In Figure [6] we see that within a few sections the constel- 
lation changes from reliable sections (towards the boundary) 
to sections which all have more or less the same reliability. 
In other words, this transition happens quickly. This is made 
precise in the following lemma. 

Lemma 61 (Transition Length): Let e BP be the BP threshold 
for transmission over the BEC using the (dj, d r )-regular (un- 
coupled) ensemble. For e G (e BP , 1], let x u (e) be the smaller of 
the two strictly positive roots of the equation h(x) = 0, where 
h(x) = e(l - (1 - a;)<*r-i)di-i - x . For < e < e BP , define 
x u (e) = lim^BP x u (5). 

Consider transmission over a BMS channel c. Let w be 
admissible in the sense of property ([TvJ of Definition [40] Let 
(c, x) be a proper one-sided FP on [-N, 0] with any boundary 
condition. Let 93i = 93 (xj) denote the Battacharyya parameter 
of the density associated to the i-th section and define e = 
93(c). 

Then, there exists a positive constant c(di,d r ) which de- 
pends on dz and d r , but not on N or the channel c, so that 
for any <5 > 

c(dz,d r ) 



{i : 8 < 93, < x u (e)} 



< w- 



Proof: Throughout the proof we set e = 93(c) and we 
write 93 i for 93(x ?; ). 

Note first that we have to prove the statement only for e G 
(e BP ,l]. This is true since we have defined x u (e) to coincide 
with x u (e BP ) for e G [0,e BP ] and since further the function h, 
which we use to bound the process, is strictly decreasing as 
a function of e. Hence, in the sequel our language will reflect 
the fact that we have e G (e BP , 1]. 

(i) The number of sections such that 93^ G [5, x*(e)] is at 
most w(-^-g + 1). If 5 > x»(e) then the number of sections 
in this part is 0. Hence wlog assume 6 < x*(e). Let i be the 
smallest index so that 93; > 5. If 93j + ( I „_i) > x*(e) then the 
claim is trivially fulfilled. Assume therefore that 93; + („,_i) < 
x*(e). From the monotonicity of g(-) and the fact that x is 
increasing, 

x, = c © g(Xj_(™_x), . . . ,Xj, . . . ,x i+ ( tu _i)) 
-< c © g(xi+( w -i), ■ ■ - ,Xi+(w-i))- 
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This implies 

extremes of info. comb. 

03, < eg(%$i+(w-i), ■ ■ ■ j ®<+(tu-i))- 

As a consequence we get 

®i+(tu-i) — 03, > 53 i+ ( lu _i) — e5(55 i+ (u,_i), . . . , 5B i+ ( w _i)) 

Lemma [59]liiit Lemma [59] I viit 

= -/»(5Bi+( tt -i)) > > «,(e)& 



This is equivalent to S i+ (. UI _ 1 ) > 03i+K»(e)£. More gen- 
erally, using the same hne of reasoning, s B i+ ;( lu _ 1 ) > 
05, +ln*(e)S, as long as «B i+i ( l( ,_ 1 ) < x*(e). 

We summarize, the total distance we have to cover is 
x* — S and every (w — 1) sections we cover a distance of 
at least K*(e)8 as long as we have not surpassed x*(e). 
Therefore, after [w — 1) L~i~7[j}7~J sections we have either 
passed x* or we must be strictly closer to x* than K*(e)8. 
Hence, to cover the remaining distance we need at most 
(w — 2) extra sections. The total number of sections needed 
is therefore upper bounded by w — 2 + (w — ^)[ x "^fzf \, 



k.(c)5 

1). The 



which, in turn, is upper bounded by w( 
final claim follows by bounding x* (e) with 1 and (e) by ac* 

(ii) The number of sections such that 05,; £ [x* (e), x M (e)] 



is at most 2w{ 



1) Let us define 



33j = _^sJ2j tk lq^i+j-k ■_ From _Lemma [58] 
*B t < eg@Si, »»-,. '.,»<) = 25, + Summing 
this inequality over all sections from — oo to k < we get, 



i= — oo i— — oo i=— oo 

Writing 2 Ji __ 00 25? in terms of the Q3^s and rearranging terms, 

- E m^^^eY""^ 1 )^-^) 

i— — OO 2—1 ^ ^ 



< ^"(25fc+(«;-l) — 2$k-(tu-l))- 



Let us summarize: 



05 fc+(w _ 1) -05 fc _ ( _ 1) >-- E ( 41 > 



Without loss of generality we can assume that there exists 
a section k so that x*(e) < *Bk-(w-i) ( we know from point 
(i) that we must reach this point unless the constellation is too 
short, in which case the statement is trivially fulfilled). Con- 
sider sections V&k—(w—i)i . . . , %$k+(w-i)> so tnat m addition 
25/c+O-i) < x u (e). If no such k exists then there are at most 
2w — 1 points in the interval [x* (e), x u (e)], and the statement 
is correct a fortiori. 

Our plan is to use (l4TT i to lower bound 



25fc+(«>-i) 



i) - ® fc - 



(«;-!)• 



This means, we need a lower 



bound for — ^ X^=-oo M®*)- Since by assumption 
05fc+(„,_i) < x„(e), it follows that 03fc < x u (e), so that 
every contribution in the sum — — X^=-oo M^i) i s positive 



(cf. Lemma [59] ©). Further, by (the Spacing) Lemma [57] 
wfJBi - *8i-i) < 1. Hence, 



(i 



/i(03i) > -6 E M®<)(»< - Bi-i) 

- 'X 

3«;*(e)(x*(e)) 2 



fc 

E 

z— — oo 

> 6k* (e) / xdx 

~ Jo 4 

Let us explain how we obtain the last inequality. First we 
claim that there must exist a section i with 05i between 
x*(e)/2 and x*(e). Indeed, suppose on the contrary that this 
was not true. Let k* < k be the smallest section number 
such that 05/c* > x*(e). Clearly, such a k* exists. Indeed, 
since x*(e) < ^Bk-iw-i), it follows that 03 fc > x*(e). Since 
OS-oo = 0, we must have 05fe*-i < x*(e)/2. This implies that 
05fe» — 05fc*_i > x*(e)/2. Using (the Spacing) Lemma |57] we 
conclude that > x*(e)/2. Hence w < 2d;/x*(e). Using 
the universal lower bound on x* (e), we get w < 2dfd 2 , a con- 
tradiction to the hypothesis of the lemma. Finally, according 
to Lemma [59] part (|iv]>, —h(x) > K*(e)x for x £ [0, x*(e)], 
which implies the inequality. Combined with fiTT i this implies 
that 



03 



k+(w-l) 



®fc-f>-l) > 



We summarize, the total distance we have to cover is x u (e) — 
x*(e) and every 2(w — 1) steps we cover a distance of at 
least 3«.(«)(*.M) as long as we have not surpassed x u (e). 
Allowing for 2(w — 1) — 1 extra steps to cover the last part, 
bounding again w — 1 by w, bounding x u (e) — x* (e) by 1 and 
replacing K*(e) and x*(e) by their universal lower bounds, 
proves the claim. ■ 

Appendix J 
Saturation - Theorem |47] 

Before we proceed to prove the Saturation theorem, we 
introduce a key technical element required in the proof, a 
family of spatial (approximate) FPs. This is the content of 
Definition 162] and Theorem [63] Then, Theorem l64l shows that 
the GEXIT integral of this family depends only on its end- 
points. Combined with the Negativity lemma 127] this imposes 
a strong constraint on the channel value of the spatial FPs, 
culminating in the proof of the Saturation theorem. 

Definition 62 (Interpolation): Let (c*,x*), c* G {ch}, de- 
note an increasing one-sided constellation on [-N, 0] for 
the parameters (di,d r ,w). Let h* = H(c*) > and let 
< L < N. 

The family (of constellations) for the (di,d r , L,w)- 
ensemble, based on (c*,x*), is denoted by {c (T ,x <t }|J._q. 

Each element x CT is symmetric with respect to the spatial 
index and the components are indexed by [—L,L]. Hence it 
suffices to define the constellations in the range [-L, 0] and 
then we set x CT .i = x a -i for i 6 [0,L]. As usual, we set 
x a . t = A +oc for i <£ [-L, L}. For i £ [-L, 0] and a e [0,h*) 



define 



t ax 



H—N+L 



■ (1 - Jr^A+oo, <r €[(),£], 
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where for a G (£,h*), 

3a,i = a ( <7 ) X i-[(2-£a)(N-L)] + 

(1 - a(<7))x*_^ 2 _ 2_ a){N _ L ^ +1 , 

a(tr)=({N-L)(2-^a)) mod (1). 

Finally, c a = c h=h * = c*. ■ 
Discussion: 

(i) Notice that in the above definition when a approaches 
h*, then x aA = x*. 

(ii) In the definition above, we keep the channel constant 
across the sections and over a. In other words, the 
channel remains constant for all the constellations in the 
family. 

We denote the two partitions in the interpolation as 
phases, e.g., (h*/2, h*) corresponds to phase I and [0, 
corresponds to phase II. 
(iii) The above interpolation might look complicated. But 
there is a straightforward interpretation. Think of one- 
sided constellations. We are interested in a constellation 
of size L. 

In phase I, the basic idea is to "move" the constellation 
x* to the right and at each point in time to "chop off" the 
overhanging parts both on the left and on the right. We 
do this until the left most section of x* is at position 
— L. If x* were a continuous function, i.e., suppose 
we had a continuum of sections, then this would be 
all we need to do. But x* is discrete, so in order to 
get a continuous interpolation we interpolate between 
two consecutive elements of x*. This mimics the "wave 
effect" we mentioned in the beginning. 
In phase II, the residual constellation is uniformly 
brought down to A+oo in each section. 
In the next lemma we show that if we have an interpolated 

family constructed via the above definition, then the resulting 

family is a family of approximate FPs. 

Lemma 63 (Interpolation Yields Approximate FP Family): 

Let (c*,x*), c* G {ch}, denote an increasing one-sided 

constellation on [— TV, 0] with free or fixed boundary 

condition for the parameters (di,d r ,w) and let w < L < N. 

Assume that (c*,x*) fulfills the following conditions, for 

some < 5 < ^. 

(i) Constellation is close to A +00 "on the left": 

(ii) Constellation is flat "on the right" 

x_ L = x_ L+1 = ■ ■ ■ = x = x. 

Also, d(x*_ L _ w+v x) < 6. 

(iii) Constellation is approximate FP: For i G [— N, 0], 

d(x|,c* © g{x*_ w+11 . . • ,x<+™-i)) < 5. 

Let {c (T ,x o .}^. =0 denote the family as described in Defini- 
tion [62] Then this family is an approximate FP family. More 
precisely, for g_ = and a = h* 

(i) {c CT }£ and {x CT }£ are ordered by degradation, increasing, 
and piece-wise linear, 



(ii) x a i = A +oc for i ^ [— L, L] and for all a and 

(iii) for any a G [cr, cf) and any i G [— L +w — 1, —w + 1] U 
[w-l,L-w + l] 

d(x<T,i,c CT © g(x<j^_ w+ i, . . . ,x^ i+w -i)) 
< 2(4,-1)^-1) +& 
w 

Discussion: For the boundary [— L, — L+w — 2]U[L — w+2, L] 
and in the middle [—w + 2, w — 2] the interpolation does not in 
general result in an approximate FP. Fortunately this does not 
cause problems. We will see in Theorem l64l that each section 
gives only a small contribution to the GEXIT integral. If we 
choose L sufficiently large then we can safely ignore a fixed 
number of sections. 
Proof: 

(i) That {c CT }^ and {x CT }^ are ordered by degradation, in- 
creasing, and piece-wise linear follows by construction. 

(ii) In the same way, that x^ = A +oc for i ^ [— L, L] and 
for all a also follows by construction. 

(iii) It remains to check that the family so defined constitutes 
an approximate FP family. Since the family, by definition, 
is symmetric around the section 0, we check only for the 
sections belonging in [— L + w — 1, — w + 1]. 

Phase I: Think of i and a as fixed, i G [— L + w — 
1,-uH-l]. Define c = c(a) and j = i-\(2-£a)(N- 
L)~\ . Set z* = cx* +cx* +1 . With these conventions, we 
want to bound 

d(z*,c h . ®g(z*_ w+1 , ■ ■ ■ ,z* +w _ 1 )). 

Using the convexity property (O of (the Wasserstein 
metric) Lemma [T3l it is sufficient to bound 

d(x*,c h « ®.g(z*_„, +1 ,--- ,z* j+w _ 1 )), and 
d(x* j+1 ,c h . ®g(z*_ w+1 ,--- ,z* j+w _ ± )) 

separately. The two bounds are identical and their 
derivation is also essentially identical. Let us therefore 
concentrate on the first expression. Using first the 
triangle inequality and then the regularity properties 
( fvTb and (I viib as well as the convexity property ©, we 
upper bound the first expression by 

d(ch* ® g{Xj_ w +i, ■ ■ ■ ,Xj +w _ i), 

Ch* ®g{z*_ w+1 , ■ ■ ■ ,z* +w _ 1 )) + 
+ d(x* , c h - © g(x*_ w+1 , x* +w _ 1 )) 

1=0 k=Q 

i w — 1 - w — 1 , 

Z=0 k=0 

r,/ j ,1 W — 1 -, W — 1 

1=0 k=0 

w — 1 
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n , , -, \ / i \ W — 1 «U — 1 III — 1 

< 2W - 1>( >- 1) E<E^.E^)^ 



((!- 



Define 



;=0 fc=o 



fe=0 



2(d l)(d iy' +lu_1 w_1 

■ — X! rf E x '-'='5I cx i-fe +<!x i-fe+i) +5 



w- 



2(dj-l)(dr-l) 



i+w— l 

c 51 d(x*-u,+i»x* + i)+<5 



2(d,-l)(d r -l) 
< l-d, 

where to obtain the first inequality we use the approx- 
imate nature of x* and in the last step we have used 
property {n} of Lemma [T3l 

Phase IT. In this regime we interpolate the "tail" of 
the original constellation uniformly to A +00 . From the 
assumption of the lemma we have f &{x*_ N+L ) < 5. 
Since x* is increasing we must have 95(x<_jv+i) < 
s B(xl A r +i ) for i £ [— L, 0]. Lemma [T3l property (lull) , 
then implies that d(x*_ N+L , A+oo) < S for all i £ 
[~L,0]. 

Again, think of i and a as fixed, i £ [—L+w—l,—w+ 
1]. Set c = 2er/h* and j = i-N + L. Then 

^(cX^+cA+oojCh^fffo^u^i+cAioo,. . .jO^.j.+ cA+oo) 

< dicXj+cA^o, A +OQ ) 

+ d(A +oc , c h »©.g(cx*_ uH _ 1 +cA foo ,. . .,0^ fl0 _ 1 +cA +oo ) 

< 2(dj - l)(d r - l)Jc+c<5 

*<V» 2(d, - l)(d r - 1) 



< 



5, 



where to obtain the penultimate inequality we use 
Lemma [33] to bound the distance of c h * ®g{cx*^ mJ _ l + 
cA +00 ,...,ckJ^ 1 + cA +oc ) to A +00 (= c h . © 
g(A +ao , . . . , A +oc ), since A +QO is always an FP of 
DE) and the second expression is the distance of 
cx* + cA +oc to A +oc , which is bounded using the 
previous arguments. 

■ 

Next, we show that if we have an approximate family of 
FPs, then the area under the GEXIT integral associated to the 
family depends only on the "end points" of the interpolated 
family. 

Theorem 64 (Area Theorem for Approx. FP Family): 
Let {c^Xg.}^ denote an approximate FP family for the 
(di,d r , L,w) ensemble. More precisely, 

(i) {co-}£ and {x CT }^ are ordered by degradation, increasing, 
and piece-wise lineal, 

(ii) x a i = A +oc for i ^ [-L, L] and for all a, 

(iii) i = x^ for i £ [-L, L], 

(iv) Xff i = Xct for i £ [-L, L], and 

(v) for all i £ [-L + w - 1, —w + 1] U [w - 1, L - w + 1] 
and er £ [a, a] 



8 In fact, we will apply this theorem to the family given in Definition 1621 
More generally, however, given a set of distinct ordered densities ai -< a2 -< 
— -< a n , we get a piece-wise linear family by linearly interpolating always 
between consecutive densities. 



A({c (n y v y g ) = ^2 G({c ff ,,9(x CTi? ^i,...,x CTi ^i)}£), 

i=-L 

where G({c cr , g(xcr,i-uH-i,- ■ ■, is the GEXIT inte- 

gral introduced in Definition |23] Let 

A(x) = H(x) + (di - 1 - ^)H(x ffld ") - (d, - l)H(x ffld "- 1 ). 
Then A{{c a ,x <r }'^) is well defined and 



2L+1 



A{x w ) + A(x<r) <b(d[,d r ,6,W,L), 



where 



b(di, d r , 6, w, L) 



l\w{l + did r ) 



4(V2 + — d,(d r -l))VS. 
In 2 



2L + 1 

Discussion: In words, the theorem says that for any family of 
spatial FPs which start and end at a constant (over all sections) 
FP, the GEXIT integral is given by the end-points and is close 
to the difference of the A expression introduced in Lemma [26l 
In fact, from the Lemma [26] we see that, graphically, this is 
equal to the area under the BP GEXIT curve of the underlying 
ensemble between the two end-points. 

Proof: Let us consider the circular ensemble which is 
associated to (di,d r ,L,w) (see Definition |3~T1i. As defined 
in the statement of the lemma, for i £ [—L,L], the chan- 
nel "seen" at position i is = c a . For the remaining 
sections i £ [L + 1,L + w — 1] we impose the "natural" 
condition c CT i = A +oc . As a consequence, for these positions 

Since {c^} as well as {x CT } are piece-wise linear, all GEXIT 
integrals are well defined (see the proof of Lemma [26b . 
Consequently, A({c a , x a }^) is well-defined. 

Instead of determining A({c a , x^}^), directly, let us deter- 
mine the equivalent quantity associated to the circular ensem- 
ble, i.e., we include the w — 1 extra positions [L + 1 , L + w — 1] . 
Since for all "extra" positions the associated channel is con- 
stant, and so the additional integrals are zero, the numerical 
value of these two unnormalized GEXIT integrals is in fact 
identical. 

We will now derive upper and lower bounds for the GEXIT 
integrals for the given approximate FP family. Recall: for i £ 
[-L + w — 1, — w + 1] U [w — 1, L — w + 1] we have a 6- 
approximate (in the Wasserstein metric) FP family. For i £ 
[-L, -L + w - 2] U [-w + 2, w - 2] U \L - w + 2, L] all we 
know is that the channel is a monotone function of a. Finally, 
for i £ [L + 1, L + w — 1] the channel is frozen to "perfect." 
Let us start by deriving a lower bound. 

Boundary: For i £ [-L, —L + w — 2] U [— w + 2, w — 
2] U [L — w + 2, L] the GEXIT integral is non-negative. 
Thus, in this regime, we get a lower bound by setting 
each GEXIT integral to (cf. Lemma [Tol). 
Interior: Consider the GEXIT integrals for i £ \—L + 
w-l i -w+l]U[w-l,L-w+ 1]. 

Technique: Rather than evaluating these integrals 
directly we use the technique introduced in 1110811 . 
i.e., we consider the computation tree of height 2 
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rooted in node i as shown in Figure [9] for the specific 
case [di — 2,d r = 4). More precisely, there are 



root 




leaves 

Fig. 9. Computation tree of height 2 for (2, 4)-regular LDPC ensemble. 

di check nodes connected to this root variable node 
and (d r — 1) further variable nodes connected to each 
such check node. So in total there are di check nodes 
in this tree and 1 + di(d r — 1) variable nodes. We 
call the starting variable node, the root and all other 
variable nodes, leaves. By symmetry it suffices to 
consider one branch of this computation tree in detail. 
Let j, j G [i, i + w — 1], denote the position of a 
particular check node. We assume that the choice 
of j is done uniformly over this interval. Let ki, 
I G [l,d r — 1], h G [j — w + denote the 

position of the l-th variable node attached to this 
check node, and let the index of the root node be 
0. For the leaf nodes we assume again a uniform 
choice of ki over the allowed interval. Note that, 
wlog, we have set the position I = for the root 
variable node. For each computation tree assign to its 
root node the channel c CTl ,, whereas each leaf variable 
node at position fc; "sees" the channel x^.^ . Note that 
for our model of the tree, the distribution (averaged 
over this choice) which flows into the root node is 
exactly <7( x <x,*-^fi v • Vi+w-i). as required for the 
computation of A({c a , x CT }£). 

Let us describe the basic trick which will help us to 
accomplish the computation. We will first determine 
the sum of all GEXIT integrals associated to such 
a tree. From this we will then subtract the GEXIT 
integrals associated to its leaf nodes. This will give 
us the GEXIT integral associated to the root node, 
which is what we are interested in. 
More precisely, we use ( f37l >. The lhs of this equation 
gives us the contribution of the overall tree and the 
rhs contains the GEXIT integral of the root node plus 
the GEXIT integrals of the leaf nodes. For the current 
case, we stress that all the operations (integrals of 
derivatives and partial derivatives) in d37l i are well- 
defined since the family we consider is piece-wise 
linear 

Contributions from overall tree: Recall that for i G 



[-L, L], Xi g_ — Xcr and x^ — x^. 

Consider first the case a = cf and i G [— L + w — 

1 , — w + 1] U [w — 1 , L — w + 1]. From Lemma [54] we 
know that the conditional entropy H(X | Y) of the 
tree code is given by 

H(x r ) + di{d r - l)H(xy) - H(x y ffl xf^" 1 ) 
-(dj-ljH^" 1 ), 

where x^ = c w © (xf^" 1 )®' 4 '- 1 . Now recall that 
d(x w , x w ) < 6. Define T(x) as 

(1+d, {dr - l))H(x) -H(x ad -) - (di - ljHCx 8 ^- 1 ). 

Then (dropping the subscripts a for a moment), 

|H(X|Y)-T(x)| 

< |H(x)-H(x)| + |H(xfflx Hd '- 1 )-H(x H!i '')| 

Lem. |13|ixl 

< h 2 (d(x,x)/2) + h 2 (d(x®x md >-\x® d r)/2) 

Lem. lBlviil (25} 

< 2h 2 (d(x,x)/2) < 4 V / d(x,x)/2 < 2V25. 

Exactly the same argument tells us that the entropy of 
such a tree for a = a is, up to a possible error of size 
2^25, equal to T(x s ). We conclude: the difference 
of the total entropy of such a tree is lower bounded 
by T{x w ) - T(xa) - 4:V25, call this B - Ay/25. 
Contributions from leaves: We need to find the 
contributions of GEXIT integrals associated to all 
the leaf nodes of each such tree rooted at a position 
% G [-L + w - 1, -w + 1] U [to - 1, L - w + 1]. 
The exact such sum is difficult to determine. But we 
only need an upper bound to derive a lower bound 
on the overall GEXIT integral. Note that GEXIT 
integrals are non-negative. Hence, let us compute 
the sum of GEXIT integrals of leaf nodes of all 
computation trees, whether they are rooted in a 
position i G [—L + w — 1, —w + 1] U[w—l,L — w + l] 
or not. 

By symmetry, this contribution is easy to determine. 
More precisely, consider the following equivalent 
procedure. Pick a check node at position j, j G 
[-L, L + w—1]. Every check node has d r connected 
variable nodes, where each variable node is picked 
with uniform probability and independently from the 
range [j — w+1, j] and the choice of the d r variables 
is iid (note that the connections are taken on the 
circular ensemble). 

Contributions from checks in the range [-L, —L + 
2w - 3] U [-w + 2, 2w - 3] U [L - w + 2, L + w - 1]: 
Check nodes in this range might see some frozen 
channels or channels which do not form approximate 
FPs. Hence we upper bound all GEXIT integrals 
associated to check nodes in this range by 1 (cf. 
Lemma [T6b. The number of such integrals is (7w — 
8)dj(dr - 1). 

Contributions from checks in the range [-L + 2w — 

2, —w + l]U[2w — 2,L-w + l\: Check nodes in this 
range only see channels which are approximate FPs 
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and none of the channels are frozen. There are (2L — 
6w+8)di(d r — 1) such integrals. Let us determine the 
contribution for each such integral. Since we consider 
an average over all possible computation trees, the 
(average) density entering a check node is equal for 
all the leaf nodes (there are d r — 1 such densities). 
Let us call this density x^. If we focus on a check 
node at position j, this density is equal to 

^ IV — 1 

fc=0 

However, the density entering the check node, at 
position j, from the root node will be different from 
x a , since we do not have a family of true FPs. Call 
this density x a . This density is equal to 



1 



k=0 



1 sCv 



j—k—w+1 i ' 



*><r,j—k+w — l) 



Since we assumed that we have an approximate FP 
family and due to the convexity of the Wasserstein 
metric, we conclude that d(x a , x a ) < 5. Let us define 
P(x) = H(x)- j-H(x md -). From Lemma|53]we have 
that -P(x) is the GEXIT integral of a leaf node if 
we had a true FP. Since we have an approximate FP, 
each such integral can be upper bounded by P(x— ) — 
P(xa) + ^^26, call it C+ V26. We derive this 
as follows. We want to bound the difference 



H( 



dx^ 
da 



)da 



H( 



dx g 
da 



z a )da 



where z a = x® dr 1 



= x? d -- 2 ffl x„ 



and z 

Since the family, {x CT } is piece-wise linear, we use 
(l37l i (applied in this case to the single parity-check 
code), Lemma [53] and symmetry to conclude that 
jJdaH{^ © z CT ) = P(xy) - P(xg). Since the 
family, {x CT } is piece-wise linear and ordered by 
degradation, we can reparameterize the GEXIT in- 
tegrals with the Battacharyya parameter which we 
denote by b = 23 (xo-). Thus 



H(^©(z b -z b ))d& 
b d& 



- (2L - 6w + 8)d t (d r - 1)C + 
„ ' 

contributions of approx. FP channels 

- (7w-8)di{d r - 1) + 
» , ' 

frozen and non FP contributions 

- (2L - 6w + 8)di(d r - 1)t^V6 

In 2 

v v ' 

correction due to approx. FP nature 

>(2L + l)(A(x F )-A(x^)) + A 



where 



D = —{Aw — 5)B- (7w - 8)di(d r - 1) 



> -lltu(l + did T ) since B < 1 + did r 

- AVS{2L + 1)[V2 + T^rdi{d r - 
In 2 



1) 



m "-\x^- 2 mx b ) 



To see the last inequality, using (jE}, Lemma |2TI we 
have 



H((x b ,-x b )©(z b -z b )) < — — s B(x b ,-x b V2d(z b ,z b ), 
ln(2) 

where x b -< x b /. Since S(x b /) = b' and 03 (x b ) = 



b, we get 



H((x 6 /-x b )®(z b -Zj,)) 



which gives us the bound. The last expression can 
be further upper bounded (using (Iviib . Lemma \13[ 

25. 



by J^^/2d(x b ,x b )< 1^ 
Accounting: Putting everything together, we have 

{2L - 4w + 6) (B — 4V2S) 

nb. of interior nodes sum of GEXIT integrals per tree 



Let us derive an upper bound in the same manner. 

Boundary: For i E [-L, —L + w — 2] U [-W + 2, w — 
2] U [L - w + 2, L] the GEXIT integrals are at most 1. 
This gives a contribution of Aw — 5. As usual, for i € 
[L + 1, L + w — 1] the GEXIT integral is and does not 
contribute to the area. 

Interior: Consider the GEXIT integrals for i € [— L + 

w - 1, -w + 1] U [w - 1, L - w + 1]. 

Technique: We use the same procedure as before- 
hand. But this time we need a lower bound of the 
GEXIT integrals of the leaf nodes. 
Contributions from overall tree: As before, the over- 
all contribution of each tree is equal to T(x^-)—T(x 2 ) 
plus an error term of absolute value equal to A\/25. 
Contributions from leaves: The idea is same as be- 
fore and as before, we will consider the computation 
from the point of view of check nodes. As before, 
we split the contribution in two regimes, [— L, —L + 
2w - 3] U [-w + 2, 2w - 3] U [L - w + 2, L + w - 1] 
and [-L + 2w - 2, -w + 1] U [2w - 2, L - w + 1]. 
Contributions from checks in the range [-L, —L + 
2w - 3] U [-w + 2, 2w - 3] U [L - w + 2, L + w - I}: 
Check nodes in this range might see some frozen 
channels or channels which are not approximate FPs. 
Since we are looking for an upper bound, we set the 
contribution of such check nodes to be 0. 
Contributions from checks in the range [— L + 2w — 
2, —w + 1] U [2w — 2, L — w + 1]: As we discussed 
before, check nodes in this range only see channels 
which are approximate FPs and none of the channels 
are frozen. Further, all these GEXIT integrals corre- 
sponds to computation trees whose root i is in the 
range [—L+w—1, — w+l]U[w— 1, L—w+1]. We can, 
therefore, subtract all their contributions, which are 
obtained by arguments similar to those used in the 
lower bound. There are (2L — 6w + 8)di(d r — 1) such 
integrals and the contribution for each such integral 
is at least C — t\ VS. Here, the last term takes into 
account the approximate FP nature of the channels 
and C was defined in the arguments for obtaining 
the lower bound. 



41 



Accounting: We have 

(Aw - 5) + (2L - Aw + 6) (B + Ay/26) + 

boundary nb. interior nodes total contribution per tree 

- (2L -6w + 8)di(d r - 1)C + 

S V ' 

contr. of interior check nodes 

+ (2L -6w + 8)d ; (d r - 1) j^V^ 
v v ' 

correction due to approx. FP nature 

<(2L + l)(A(x^)-A(x 2 ))+E, 
where 




< 6wdid r since C < <4wdid r 



+ aVs~(2L + 1)[V2+ — di(d r - 1)]. 

■ 

Proof of Theorem [?7| Rather than deriving the bound 
c(di,d r ,S,w,K,L) for all values of the parameters, we are 
only interested in the behavior of this bound for values of 5 
tending to and values of K and L tending to oo. Hence, in 
the sequel, nothing is lost by assuming at several spots that 
5 is "sufficiently" small and K and L are "sufficiently" large 
(consequently N is also sufficiently large). This will simplify 
our arguments significantly. 

Let (c* , x* ) denote the proper one-sided FP on [-N, 0] with 
forced boundary condition which fulfills the stated conditions 
for some 6 > and 2(w — 1)<L and L + w < K < N. We 
prove the claim in several steps, where in each step we assert 
further properties that such a FP has to fulfill. 

Constellation is almost flat and not too small "on the 
right": Recall that by assumption *&(x*_ K ) > x u (l) so that 
25 (x.*) > ^u(l) for i G [— K, 0]. Using the same reason- 
ing as in the discussion at the end of Lemma [14] we can 
conclude that there exists an i* <G [-K, — L — w] such that 
D(x*,x* k ) < D(x*,,x*) + D(x*,x* k ) + D(x* k ,x** +L+ J = 
D(x*,,x*, +L+ J < 2 -^l for all j < k and j,k G 
+ L + w]. From part dj) of Lemma PT4l we conclude that 
d(x* 7 x* k ) < ^8(L + w)/K for all i* <j<k<i*+L + w. 
Clearly, the right-hand side can be made arbitrarily small by 
picking K sufficiently larger than L + w. 

Constellation can be made exactly flat and not too small 
"on the right": Create from (c*,x*) the increasing constella- 
tion (c*,z*) on [— N, 0] with free boundary condition in the 
following way, 

z * fx-. ie[-N,i* + w], 

\ x **+«,: i>i*+w. 

The graphical interpretation is simple. We replace the "almost" 
flat part on the right plus the extra part on the right which 
might not be flat with an exactly flat part. To simplify our 
subsequent notation we set x = x*» +w and from above 
arguments note that 58 (x) > x u (l). Hence 58(z*) > x u (l) 
for all i > i* + w. 

Constellation is approximate FP: Note that by going from 
x to z no component in [-N, i* + L + w) is changed by more 



than a distance k = \J8(L + w)/K. Therefore, if we run DE 
on the modified components it is clear that in this range the 
output must still be close to the original output. More precisely, 
we have for every i € [-N, i* + L + 1] 

d(A , c* © g(z*_ w+1 , z* +u ,_i)) 

< d{z*,x*) + d(x*,c* ®g(z*_ w+1 ,...,z* i+w _ 1 )) 
<K+d(c*© 5 (x*_ TOfl , . . .,xt^ 1 ),c*®g(z*^ &1 , . . . ,z^_ 1 )) 

< k + 2(di - l)(dr - 1)k, 

where to get the penultimate inequality we first replace x* by 
c* ®g(x*_ w+1 , . . . , x*, w _i), since x* is a true FP, and then to 
obtain the last inequality we apply Lemma [33] Since k can be 
made arbitrarily small by choosing K sufficiently large, this 
verifies the approximate FP nature for i 6 [-N, i* + L + 1]. 
Let us now focus on i e [i* + L + 2,0]. Note that since 
L > 2(w — 1), we can use the above argument in particular 
for i = i* + 2w — 1. For this choice of i all involved densities, 
z t-w+ii ■ ■ ■ ' z i+w-i' are e q ua l t0 x - Therefore, the previous 
argument shows that 

d(x, c © g(x, . . . ,x)) < k + 2(di - l)(d r - 1)k. (43) 

But for i > i* + w all components of z* are equal to x and 
so the approximate FP nature of z* is also verified for i > 
i* + 2w — 1. Since i* + 2w — 1 < i* + L + 1, we conclude 
that z* is an approximate FP. 

From FP to FP family: From the approximate FP (c*,z*) 
on [-N, 0] we create the approximate FP family {c* , z* }^=o 
on [— L, 0] as described in Definition [62] 

Computing GEXIT integral - Definition \23\ Using the 
basic definition of the GEXIT functional in Definition [23] we 
conclude that the GEXIT integral associated to {c*,z*}£=q*, 
A({c* a , z* }%) is since the channel remains constant through- 
out the interpolation. 

Computing GEXIT integral - Theorem I64] We now com- 
pute the GEXIT integral associated to {c*,z*}^q by first 
applying Lemma [63] and then Theorem [64] 

More precisely, from the previous arguments we satisfy all 
the hypotheses of Lemma [63] This allows us to conclude 
that the FP family constructed above is ^'-^dr-i) _|_ ^ 
approximate FP (cf. d42l ) if K is chosen sufficiently large. 
Furthermore, since the starting (z* =h * ; = x for all sections 
i E [— L, 0]) and ending constellations (z* =0 i = A +oc for all 
sections i 6 [— L, 0]) are flat, we satisfy all the hypotheses of 
Theorem [64] from which we conclude that the GEXIT integral 
is upper bounded by A(x) + b(di , d r , 2(d '~ 1)(rf '-~ 1) + 5, w, L)@ 

Flat region has entropy not much smaller than 4 s -: From 
part dviiib Lemma [59] we get 

4(1) > (dr-iy 2 ^ > K-l)- 3 + (|) ^ 

where in the last step we have used condition (Iviib in Defini- 
tion |40] We conclude that 

H(x) > 58 2 (x)>^(l)>(d r -l)- 3 +(|) . (44) 
9 Note that A(A +ao ) = 0. 
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We now proceed by contradiction. Let us assume that H(x) < 
|l - d ; e~ 4(dr ~ 1)(TT ^t )3 - j-. As we just discussed, 

d(x,c*®(x^ i r'- 1 )< 2(d '' 1)(4 ' 1) + ( 5<( ln( y ) 2 . 

In the last step we assumed without loss of generality 
that 5 is chosen sufficiently small. The inequality then fol- 
lows from the condition <(vj in Definition [40] This, to- 
gether with d44b . guarantees that we satisfy the hypoth- 
esis of (the Negativity) Lemma [27] Hence we conclude 
that A(x) < — -T-. From condition ([vT]) in Definition [40] 

4(V2 + £zdi(d r - i ))v / ^-iK^-D ; < Hence for a 
sufficiently small (5 and a sufficiently large L, this leads 
to the conclusion that the GEXIT integral A({c* a , z*}%) < 
A(x) + b(d h d r , 2(d '^ 1 |, (dr ~ 1) + 5, w, L) < 0, a contradiction 
to the previous computation. As a consequence, we must have 

h* = H(c*) >H(x) > ^-^e" 4 ^- 1 )^) 1 -!. (45) 

The flat region is close to x BP : We will now show that x 
is close to x BP (c*), the BP FP when transmitting over the 
channel c* using the underlying (di, d r ) -regular ensemble. 
In the sequel we will denote x BP (c*) by x BP . To do this, 
we will first bound the Wasserstein distance between x BP 
and x, where x is defined to be equal to x*, +L+w . Thus to 
bound the distance between x and x BP we bound the distances 
d(x, x) and d(x, x BP ). Note from the previous part we have 
that d(x,x) = d(x*, +w ,x*, +L+w ) < k and hence the distance 
between x and x can be made arbitrarily small by taking 
K sufficiently large. Let us now bound d(x, x BP ). First, we 
show that d(x, c* © g(x, . . . , x)) can be made arbitrarily small. 
Indeed, 

d(x, c* © g(x, . . . ,x)) <d(x,x) + d(x,c*®g(x, . . . ,x)) 

+ d(c* ®g(x, . . . , x), c* ®g(x, . . . , x)) 
< K+K+4(di-l)(d r -l)K, (46) 

where to get the last inequality we have used the approximate 
FP nature of x (cf. d43l l) and the (sensitivity) Lemma [331 Since 
k can be made arbitrarily small, we can make the distance 
d(x, c* © g(x, . . . , x)) as small as desired. 

Run forward DE, with the channel c*, starting from xo = x, 
Xq p = x BP , and Wo = Ao, respectively. Let xi = T c «(x£_i), 
x BP = T c . = x BP , and = T c .(w^_i), I > 1. Recall 
that T c * (•) is the DE operator for the (di, d r ) -regular ensemble 
when transmitting over the channel c*. We will choose the 
value of £ shortly. Then 

d(x, x BP ) < d(x.o, x.e) + d(x.e, ) + d(we , x BP ) 

£-1 

<^d(x,,x J+1 ) + 2 v /Q5(w,)-«B(x,) + 2 v /'B(w,)-53(x BP )). 

3=0 



In the last step we use that wg >- x^, since Wo = Ao y xo 
and DE preserves degradation. Similarly, we use Wf >- x BP . 
Therefore we can upper bound the Wasserstein distance in 
terms of the difference of the respective Battacharyya con- 
stants according to (EI]), Lemma [T4l 



Choose £ = I — ^-r I . We then claim that x -< x, for all < 
j < £. Let us prove this claim immediately. From construction, 

x . Next, we claim that 



we have x 



-< x 



i*-\-L-\-w 



% >~ X *f+L+l-(w-l)U-l) for 1 < J < £ - Before we P rove 

this claim, we apply it immediately to conclude that 



x j ^ x i*+L+l-(iu-l)(j-l) 



x i*+w = x - 



To prove the intermediate claim we argue inductively that 

Xj = c* ©g(xj_i,...,%_i) 

>- C* © g(x*, +L+1 _(. w _ 1 < )j , . . . , + L + l-(w-l)(]-2)) 
= x i*+L+l-(!i.-l)0-l)- 

The induction is completed by verifying that >q >- x*» , L+1 . 
Indeed, from the monotonicity of the spatial FP, x*, we get 



+L-W+2' 



J X i*+L+iu) 



=xo 



-< 



c* ®g(xo,...,xo) 



Xl. 



(47) 



Let us now bound the distance d(x.j,x.j + i) for 1 < j < I, 
Since these elements are derived by DE we can use our bounds 
on how the Wasserstein distance behaves under DE (cf. Jviiib . 
Lemma |T3T > to conclude that d(xj,x.j + i) < adfej-x^Xj), 
where a = 2(d ; - l)(d r - 1)(1 - Q5 2 (x))^. To obtain 
a we have used x.j >- x for all < j < £ to get 
min{25(x : ,_i), *B(xj)} > *B(x). Continuing with above in- 
equality, it is not hard to see that we get d(x.j,x.j + i) < 
a J d(5<o,xi). This gives a bound of 



e-i 



^d(xj,x i+ i) < d(x ,xi) 



o 



1 



1 

< d(x ,xi) 



Lemma [^1 

where in the last inequality we use 03 (x) > H(x) > 
4 s - - d/e" 4(dr " 1)(rn; t )3 - -j- combined with the condition 

fly 1 d r 

<(TTJ> in Definition l40l to get a < 1. From (l46l l we know that 
we can make c?(xo,Xi) as small as we want by choosing K 
sufficiently large. 

Let us now bound the two terms containing Battacharyya 
parameters. Note that in each iteration the distance of the 
respective Battacharyya constants decreases by a factor of at 
least f3 = Q3(c*)(d;-l)(d r -l)(l-min{23(x),<B(x BP )} 2 ) d '- 2 . 
Indeed, from Lemma [51] 

B(w/)-B&) < ^B(c*)(d l -l)(d r -l)(l-^B(x) 2 ) d '- 2 y 

»(w,)-*B(x BP )<(»(c*)(d i -l)(d r -l)(l-®(x BP ) 2 ) d "- 2 ) £ . 

For the first inequality we again use S(xj) > S(x) for all 
< j < i. Above we have also used *B(wo) — 23 (xo) = 
®(A )-«8(x) < 1 and < B(w )- < B(x BP ) = ( B(A )- < B(x BP ) < 
1. We now have 



2\d r 



Q5(c*)(d ; -l)(d r -l)(l-5B(x)^) 
S(c*)(^-l)(d r -l)(l-*8(x BP ) 2 ) 



BP\2\d r — 2 



<L 
< l. 



For the first inequality we use condition <[ixj> m Defini- 

Lemma ^ 

tion [40] combined with «8(x) > H(x) > f- - 
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die 4<,dr 1 ^ii=<ir) T? — -L. For the second inequality we use 



condition mnji in Definition 



combined with h* > — 



d;e _ 4 (d r -i)( TT ^ : )7 _ i_ > £ and Lemma rjg] 

Therefore we can bound the sum of the two Battacharyya 
terms by Aft 1 ' 2 with /3 = 93(c*)(d; - l)(d r - 1)(1 - 
min{ ( 8(x), ( 8(x BP )} 2 ) d '- 2 < 1. 

Putting everything together we conclude that by choosing 
L, K sufficiently large d(x, x BP ) can be made as small as 
desired. 

h* is close to h. A : From Theorem l64l we have 

or , ~ A ( x ) < 0{ d h d r, \-S,W,L). 

2L + 1 to 
From above arguments we have yl({c*,z*}^) = hence 

2{di - l)(dr - 1) 



|A(x)| <6(dj,dr 

Using the formula for A(-) given in Lemma [ 
( Iviil l and {ix]i given in Lemma [T3l we have 



<5, to, L). 
>]and properties 



|A(x BP ) - A(x)| < 2V2Vrf(x,x BP ) 



dp(di -!-/) + v 7 ^ - l(di - r 



Recall that x BP = x BP (c*). Combining, we get 



|A(X BP )| <b(d h d r , 



2(d t - l)(d r - 1) 



2y/2Vs(l + ^d~ r {di - 1 - + Vdr - l(di - 1) 



+ 5, iu, L) 



Further the BP GEXIT value for all channels between h* 
and h A is lower bounded by 2(d z rrxs To show this we first 
note that from condition (ITTTb and (Iviiili in Definition [40] we 



satisfy the hypotheses of Lemma [29] Hence from Lemma [29] 
we have h A > h. Also, from (l45l l we have h* > h. 

Then for any h > minjh" 4 , h*} we have S(x h ) > x u (l) 
(cf. Lemma [T8J. Thus we conclude that S(x h ) > x u (l) > 



(ci,.-l) 3 /2 

we have, 



for any h > min{h , h*}. Denoting y h 



.Bdr — 1 



, concavity of GEXIT 

G(c h ,yf d > 2£(y® d ') 

extremes of info., 
mult. prop, of Batta / 

> 1 - Jl - (<B(y h )) 2d ' 




1 



> 



1 



(dp - l) 3 " 2(dr- l) 3 ' 

To obtain (a) we use 93 (x h ) = 93(c h )(93(y h )) di " 1 , since 
Ch and x n form a FP pair. This implies that (93(y n )) 2d ' = 

cn { \ 2d i 2d i 2d i Lemma [591 

(Ig)^- 1 > (93(x h ))^ > (x u (l))^ > {dr _ 

1) d i' 2 > (d r — 1)~ 3 . The last inequality follows since 
condition (Iviil l in Definition [40] implies that di > 6. This 
implies 



/ G(c h ,y® d 0dh 
Jh A 



> h* 



1 



'2(d r -l) 3 ' 



Since h* and h A are both greater than h, from Lemma [26] we 
have 

" G(c h) y® d! )dh =|A(x BP )-A(x^)| = |A(x BP )|, 

where the last equahty follows since A(x™ A ) = (cf. 
Lemma [29b. 

Putting everything together we get 

2{di - l)(d r - 1) 



\h* -h A \<2(d r -iy ib(d h d, 



+ S, w, L) 



+ 2V2\/J(1 + y/T r (d l - 1 - + ^Jd r - l(di - 1))) . 

Appendix K 
Existence of FP - Theorem [48] 

Proof: Before proceeding to the main part of the proof, 
let us show that if we assume that there exists a proper FP 
on [— N, 0], with forced boundary condition on the right and 
A +00 on the left (i < —N) and with Battacharyya parameter 
of the constellation (cf. Definition [37li equal to x u (l)/2, then 
the desired properties (i) and (ii) mentioned in the statement 
of the theorem follow. 

Constellation is close to A +oc "on the left": Let N\ be the 
largest integer so that for all i < —N + Ni, 93(x l ) < S. We 
have a proper FP and w > 2dfd 2 (because w is by assumption 
admissible in the sense of condition (|iv| in Definition 1401 ). 
Hence by applying (the Transition Length) Lemma [6TJ we 
conclude that the number of sections with Battacharyya pa- 
rameter bounded between S and x u (l) is at most wc(di, d r )/8, 
where c(di,d r ) is the constant defined in Lemma loTI Since 
the Battacharyya parameter of the constellation is x u (l)/2, 
we have 

M 1 ) 



(N + iy 



>(N + 1-Ni -wc(di,d r )/6)x a (l). 



This implies that N x > (N+l) (g- ^f^ ') ■ Using property 
© of (the Wasserstein metric) Lemma [131 we conclude that 
for all i < -N + N lt d(xj, A +oc ) < S. 

Constellation is not too small "on the right": Let N\ be as 
defined previously. Again, since the Battacharyya parameter 
of the constellation is equal to x u (l)/2 we have 

(N + i)^p- <jv 1( y + (jv+i-jVi), 

where on the rhs above we have replaced the sections with 
value greater than 5 by the maximum value of 1. 

Thus if we define 



This implies that iVi < (N + 1) 



is 



N2 as the number of sections with Battacharyya parameter at 
least equal to x u (l), we must have 

N 2 > (N + 1) - Ni -wc(di,d r )/6 
'Xu(l) wc{di,d., 



>(#+!)(■ 



6(N + 1) 



where we used S < 



to obtain the above expression. 
It remains to show the existence of the proper FP itself, 
with Battacharyya parameter of the constellation equal to 
x u (l)/2. We use the Schauder FP theorem in a strong form 
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recently proved by Cauty 11 131 : This theorem states that every 
continuous map / from a convex compact subset S of a 
topological vector space to itself has a FP. 

Recall that a topological vector space S is a vector space 
over a topological field F (most often the real or complex 
numbers with their standard topologies) which is endowed 
with a topology such that vector addition S x S — > S and 
scalar multiplication F x5 — > S are continuous functions. 

Let S = L\ [0, 1] (where L\ denotes the L\ norm). Note that 
S is a real normed vector space and hence a topological vector 
space. Let V denote the space of probability measures on [0, 1] 
endowed with the Wasserstein metric. Note that V C S, where 
we represent elements of V by their cumulative distribution 
functions. Note that the topology on V induced by S coincides 
with our choice (cf. second alternative definition in part (0 
of Lemma [T3l . Also, on V the topology induced by the 
Wasserstein metric is equivalent to the weak topology. Since 
[0, 1] is a complete separable metric space, so is V , see [104 
Theorem 6.18]. Since [0,1] is compact, so is V, see Ml 041 
Remark 6.19]. 

A Cartesian product of a family of topological vector spaces, 
when endowed with the product topology, is a topological vec- 
tor space. Hence, S N+1 , endowed with the product topology, 
is a topological vector space. 

Let S be the subset 

S ={|X| € S N+1 : \3C\i is a |D| -distribution, i g [-iV,0]; 
Q3(|X|) = x u (l)/2; |X|„jv -< \2\-n+i |X|o}- 

Discussion: As we discussed above, we think of the elements 
of V as cumulative distribution functions. In particular, these 
are the cdfs in the so called \D\ domain. In the sequel, rather 
than only referring to cdfs it will often be more convenient 
to write down the \D\ distributions |y| or D distributions y, 
directly. 

S is non-empty: Setting all elements of |y| equal to 
x u (l)/2Ao + (1 — x u (l)/2)Ai gives an element in this space. 

S is convex: Let x,y £ S with \D\ -distributions given by 
|y| and |rj| respectively. Let |t)| = /3|y| + (1 — /?)|t)| for some 
(3 g (0, 1). Since fB(-) is a linear operator, we see that 

B(lfi|) = Wlfl) + (1 -0) B(|fi|) = *u(l)/2. 

Also, using (01, we see that |t>|;-i ~< |o|» for all i e [-N + 
1,0]. Hence /3x+ (1 - /3)y g S. 

S is closed: Consider a sequence {lyl^}^^ of elements of 
S and assume that this sequence converges in the Wasserstein 
metric to a limit, call it |r-|'°°). We need to show that 
ly^ 00 ) £ 5^ j e we claim that S is closed. In this respect, 
recall from our discussion above that S C J> N+1 and that on 
pN+i jjjg t p j gy induced by the Wasserstein metric is the 
weak topology. 

From Lemma 4.25 in l62l we know that each component of 
is a symmetric \D\ distribution. It therefore remains to 
shows that (i) *B(|y|(°°)) = x u (l)/2, and (ii) \^°°} -< | ? | l (oo) 
for all i g [— N + 1, 0], Both claims follow from the fact that 
we can encode the above properties in terms of continuous 
functions and that continuous functions preserve the properties 
under limits. 



Let us show this in detail. We begin with (i). Consider the 
sequence We have 



j=-N 



Now note that yl — y 2 is a bounded and continuous function 
on [0,1]. Therefore, (weak) convergence of {|?|^} to 
implies (weak) convergence of Q3(|p|W) to 03(1^1^°°^) = 
z u (l)/2. 

Let us show (ii). From (O, ^ \v\j is equivalent to 

!l {Xlfl^dx < \\ \T\ { P{x)Ax for all z g [0, 1]. We have 

|X|g(x)dx< f \X\f°\x)dx+ 

J Z 

/ViS°°^)dx- f'mfi^dx 

J z J z 



,(oo) 



(x) dx 



X\ ( f\x)dx. 



(48) 



By assumption, the sequence {Ifl^- 1 } converges in the sense 
of the Wasserstein metric. Therefore from property dmb of 
Lemma [jj] for all j g [-N + 1,0], lim^^ \X\f\x) = 



\X\j'(x) for all x g [0, 1] such that |X|j° is continuous at 
x (in other words, weak convergence is equal to convergence 
in distribution). This implies that for all j 



lim 

i— >oo 



(x) dx 



,(oo) 



(x) dx 







so that from (l48b we conclude that 

/ \X\ { ™\{x)dx< / \X\ < f°\x)dx. 

J z J z 

S is compact: Note that S is a closed subset of 
which is compact since it is the product of compact spaces. 
Hence S is compact as well. 

Definition of map V(-): In order to show (via Schauder's 
FP theorem) that 5* contains a FP of DE we need to exhibit 
a continuous map which maps S into itself. Our first step is 
to define a map, call it which "approximates" the DE 

equation and is well-suited for applying the FP theorem. The 
final step in our proof is then to show that the FP of the map 
V(\i\) is in fact a FP of DE itself. 

The map V(|y|) is constructed as follows. For |p| g S, let 
t/(|y|) be the map, 

{U(\t\))i = g(\z\i- w+ i,...,\tU+w-i), [-N,0], 

where = A +oc for i < —N, and where = Ao for 
i > 0. Define V : S -> S as 

s.t. Q3(|c 



E%|)®|c| 



v(\t\) = 



_ x„(l) 

- 2®(l7(| r |))' 



x»(l)/2 < 2S(C/(|?|)), 



+ a(l?l)A , otherwise. 



In words, if C/(|y|) is "too large", upgrade it by an appropriate 
channel |c|. If, on the other hand, t/(|p|) is "too small" then 
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we take a convex combination with Aq. In the preceding ex- 
pressions, terms like at/(|y|) denote component-wise products, 
i.e., the result is a vector of densities, where the i-th component 
is the result of multiplying the i-th component of with 
the scalar a(|y|)i. Further, a is a shorthand for (1 — a(|f|)). 

It remains to specify the components of Note that 

a(|y|) G [0, Further, we require that its components are 

increasing and that they are all either or 1, except possibly 
one. I.e., a(|y|) has the form (0,0,..., 0, Oj, 1, . . . , 1), where 
i G [— N, 0], and on G [0, 1]. This defines the vector uniquely. 
Pictorially we can think of this map in the following way. We 
start at component (E/(|y|))o- We take an increasing convex 
combination with Ao until the overall Battacharyya constant 
is equal to x u (l)/2. If this is not sufficient, then we set 
( V(|y|))o = Ao and repeat this procedure with component 
(J7(|y|))_i, and so on. To apply Schauder's theorem, we need 
to show that the map V(-) is well-defined and continuous. 

Map V(-) is well defined: First consider the case 



<B(C/(|y|)) > x u (l)/2. In this case 



< 1. Since the 



2 9S(C/(|||)) 

Battacharyya parameter is a strictly increasing and continuous 
function of the channel, there exists a unique |c| G {|c| CT } 
such that Q3(|c|) = 2 <s(u(\r\)) - Note also that ^(1*1) is 
monotone (spatially) since (?(-Jis monotonic (as a function of 
its arguments) and |j| is monotone. Consequently, U{\i\) © |c| 
is monotone. Further, from the multiplicative property of the 
Battacharyya parameter at the variable node, we get that 
<8(V(|y|)) = s 8(J7(|y|)) s B(|c|) = x u (l)/2. It follows that in 
this case V(IfI) g s - 

Consider next the case Q3([/(|y|)) < x a (l)/2. If we choose 
a = 1 then we get a Battacharyya parameter of 1. Further, 
the increase in the Battacharyya parameter is continuous. 
Hence there exists an a so that the resulting constellation has 
Battacharyya constant equal to x u (l)/2. Also, by construction 
the resulting constellation is monotone. This shows that also 
in this case V(|p|) G S. In both the cases above, the map 
maintains the symmetric nature of the Z3-distributions. 

We summarize, V maps S into itself. In the rest of the proof, 
we will use the notation d(|y|, |t}|) = J2°i=-N M0 t0 

denote the Wasserstein distance between two constellations 
|y| and |r,|. 

Continuity of map V(-): We will show that for every |y| G S 
and for any e > 0, there exists a v > such, that if |rj| G S 
and d(|j|,M) < u, then d(V(|y|), V(||)|)) < e. Note That if 
d{\l\M) < v then 

(i) d(U(\t\)i, U(\g)i) < 2(d l - l)( d r -l)v, ie [-N, 0]; 

(ii) | nU(\m - B(tf(l2l)i)| < V*(di - Wr - 1)^, * G 
[-N, 0] ; 

(m) d(|c||j|,|c|| £ |) < 2y ^(i) if 

B(tf(|?D) > ah.(l)/2 and 5B(17(|2|)) > x u (l)/20 

10 That the Battacharyya parameter is continuous follows since the channel 
family is smooth. Further, since the Battacharyya kernel is strictly concave 
and the channel family is ordered by degradation, the Battacharyya parameter 
is strictly increasing. 

"We abuse notation slightly to denote the channel associated to |f-|,|rj| 
by kl |f 1 1 M 1 1)| . respectively, rather than denoting them by the standard 
parameterization a. 



Assertion (i) is equivalent to Lemma (f33) since if |rj|) < 
v then a fortiori M0 < v, i G [— iV, 0]. Assertion 

(ii) follows from assertion (i) by applying property (|ix]i of 
Lemma [T3l To see assertion (iii) we write 

l»(|c| w )-B(|c||,|)| = ' 



< 



< 



2«8(t%|)) 2®(C/(|t 2 |)) 1 
2 f8(U(\t\))Wm) 



2(iV+l) x /4(ri,-l)(ri r -l> 
a; u (l) 



The last inequality follows from assertion (ii) and 
Q3(.7(|y|)),Q3(t/(|rj|)) > x a (l)/2. Recall that the channel 
family is ordered by degradation. We can therefore apply 
property djlji of Lemma [14] to prove our claim. 

Choosing v as a function of |y| and using assertion (ii) 
above, we can therefore assume that either Q5(J7(|p|)) > 
z u (l)/2 and «8(E/(M)) > x u (l)/2 or «B(f/(|y|)) < x~(l)/2 
and <B((7(M)) < x u (l)/2. In the first case, 

d(V(\±\), V(|oD) =d(|c|, £ | © C/(|£|), |c|,,| © C/(| 2 |)) 

{vi), Lem.[T3l 

< 2d(,7(| £ |) ) f7(| fi |))+2d(|c|| £ |,|c|| 1 |) 
(i) & (iii) 

< 4(di-l)(d r -l)v(N + 



, l 2(N+l)y/4(dl-l)(dr-l)v 

+4 V ■ 

Let us now focus on the second case. Let i* denote the 
largest integer in [— iV, 0] such that a(|f|)j« is non-zero. 
Clearly if «8(£/(|j:|)) < sc u (l), then i* < 0, else we set i* = 1. 
Similarly, let j* be the corresponding index in a(|tj|). Let us 
denote a(|f|)i* = a and aflgDj* = P- Note that < a, (3 < 1. 
Wlog we can assume that j* < i* . With this we can upper 
bound d(V(\t\),V(\<)\)) by, 



j* -i 

E 

i=-N 



d(uQ f \)i, u{\t>\)i) + d(u(M)r , + ^ A o) 



E 

i=j*+i 



d{U(\t\)j, A ) + diaUQiDi* + aA , A ). (49) 



Above we have used that for i > i* + 1 we have ^(M)? = 
V(|p|), = Ao- In the case i* = j*, the terms in the interval 
[f ,1*] collapse to d(aU(\T D*. + aA , pU(\$)i. + ^Aq). 

Let us first consider the case when j* < i* . Note that 
«8(V(|£|)) = *8(V(|oD) = z u (l)/2.This implies that if we re- 
place the Wasserstein distance by the Battacharyya parameter 
in (@9j the expression evaluates to 0. Then writing the j* term 

as P(<B(U(\l\)j-) - ®(^(lfil)i- )) + P(*M\l\)j- ) " ®(Ao)) 
we get 

P(l-X(U(\t)i'))+ E (i-W(IpIW) 
+ (l-<B(aU(M)i* +aA )) 

< J2 l»(^(l?l)i)-»(^(lfil)i)l» (50) 

i— — N 
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where above we use Q5(Aq) = 1. 

We now continue with (g9]i. We use d(f7(|y|)j* , /3Z7(M)j* + 
/3A ) < + WdfDj^Ao), © of 

Lemma [T3] and d50l l to get the upper bound 

jr d(u{\i\)i,u(M)i) 

i=-N 



+ v/2(iV + l), 



2 I»(^(l£l)i)-»(^(l0l)i)l- 



\ i=-JV 

Finally using assertions (i) and (ii) above we get that 

d(V(\l\),V(\v\)) < 2(N + l)(d, - l)(d r - l)v 
+ 2(N + l){{di-l)(d r -l)u)i. 

For the case when j* = i* we have 

d(aU(\i\)i, +aA ,pU(\Q\) i * +/3A ) 
< d(al/(|r|) 4 . + aA ,at/(|r,|) i . + aA ) 
+ d(M/(|r,|) 4 . +aA ,^(M)i. +/3A ) 
<d(tf(|?|)<. , U(\$)i.)+d(aU(\$)i. +aA , 0U{\^ +/3A ). 

Wlog we can assume j3 > a. This implies aJ7(|rj|)i*+aAo X 
/3C/(|rj|)i*+/3Ao. Hence from dnj of Lemma [T4l we can bound 
the second Wasserstein distance above by the difference of the 
Battacharyya parameters. Further, 

| ®(atf (l2l)»- +«A ) - BCS^dalV +/?A )| 
< 1 5B(a?7(M)i« +«A ) - f8(aCT(|p|) 4 . +aA )| 
+ | ®(a^(|||)i. +aA„) - ^WUiMh +/3A )|. 

The first Battacharyya difference on the rhs can be bounded 
by | s B(C/(|o|) 8 *)- s S(t / (l£l)i*)l- For the second difference we 
use same arguments as d3Ul ) to obtain 

| Q3(aC%|)i. +«A ) - BGSErfloDi. +/3A )| 
< ]T |5B(U(|£|)0-JB(^(| fi |)0|. 

i= — TV 

Combining everything with the assertions (i) and (ii), in this 
case we get 

d(V(\i\),V(\v\)) < 2(N + l)(dt - l)(d r - l)v 
+ 2V2\/iVTT((d ; - l)(d r - l)v)i. 

Existence of FP of V(-) via Schauder. We can invoke 
Schauder's FP theorem to conclude that V(-) has a FP in S, 
call it |y|*. 

Existence of FP of DE (U(-)): Let us show that, as a 
consequence, DE itself has a FP (|c|*,|y|*) with the desired 
properties. 

If Q3(C/(|f|*)) > x u (l)/2, then = K(|f|*) = U(\t\*) © 
|c|* with |c|* e {\c\ a }. Hence indeed, (|c|*,|y|*) is a FP of 
DE. 

Consider hence the case 58(J7(|y|*)) < a? u (l)/2. We will 
show that it leads to a contradiction. Recall that in this case 



and that |y|* = A for i > 1. 

Given a density |y| we say that it has a "BEC component" 
of u if |y| contains a delta at of "weight" u (i.e., contains 
a mass of u at Ao). In the sequel we will think of u as the 
erasure probability of a binary erasure channel. 

Let u be the vector of BEC components corresponding to 
|y|*. Since Q3([/(|y|*)) < x u (l)/2 we know that u has some 
non-trivial components in [— A?", 0], and by definition of the 
right boundary, = 1 for i > 0. We claim that for i e 
\-N,0l 



> g(ui 



-w + l i 



5 ^i+v 



l). 



(52) 



Let us prove this claim immediately. Extract the BEC compo- 
nent from both the left-hand as well as the right-hand side of 
( l5Tb . This gives 

m = (1 - Oi)BEC(tf(|yr) 4 ) + oh 

> (1 - ai)g(ui- w +i, ■ ■ ■ ,u t+w -i) + cti, (53) 

where we wrote as a shorthand for a(|p|*)j and BEC(-) 
denotes weight at Ao. To see the second step, i.e., to see that 
BEC(?7(|f|*)i) > g(ui- w+1 ,...,u i+w -i), let |d|* denote the 
density at the output of the check nodes when the input is 
Let v denote the (BEC) density at the output of the check 
nodes when the input is u. Some thought shows that v is also 
the BEC component of In words, at check nodes the BEC 
component evolves according to density evolution - we get an 
erasure at the output of a check node if and only if at least one 
of the incoming messages is an erasure. At variable nodes we 
only get a bound. If all inputs to a variable node are erasures 
then the output is also an erasure, but this is only a sufficient 
condition. Thus ( f53T > is proved. If on = 1, then u.- L = 1 and ( l52l 
is true. If a, < 1, then m > ^jE^f > g(ui- w +i, ■ ■ ■ , i), 
where the second step follows from d53l l. 

Extend the constellation u by A^ 3 = \(N + l)-3^— ] + 1 

7 1 

sections on the right, with values equal to 1, and let denote 
this constellation. We claim that u' ' has at least 

v ; V2 S(N + 1) 

sections on the left with Battacharyya value between and 6 
where c(di, d r ) is the constant of Lemma loTI and only depends 
on the dd. 

To prove this claim, we consider our original |y|* (before 
we extracted the BEC components) which was the FP obtained 
by Schauder's theorem. We claim that |p|* has at least N4 
segments on the left with Battacharyya constant at most S, 
where 



A^ 4 > ( N + 1) - 



A^+1 c(di,d r )w 



(i-a(iyr)Mi?r)+a(iyr)A c 



(51) 



2 5 

(b) (c) 

Let us explain each of the terms on the right. There are A^ + 1 
segments to start with, which explains (a). At most (N + l)/2 
sections on the right can have a Battacharyya value of a; u (l) 
or larger (since *8(|y|*) = x u (l)/2). This accounts for the 
(b) term. Finally, all sections i, with i < ~(N + l)/2 + 1, 
must be sections where fulfills the actual FP equations, 
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i.e., these cannot be sections where the map V(-) "pushes" 
the constellation up to Aq. More precisely, we must have 
= for i < -(TV + l)/2 + 1. Indeed, from 
construction, starting from the rightmost section, each section 
is increased all the way up to Ao before we move on to 
the next section on the left. Since the constellation |y|* has 
Battacharyya parameter equal to x u (l)/2 < 1/2 we conclude 
that for i < -{N + l)/2 + 1 we must have |y|* = (J7(|?|*))i, 
which is a true FP of DE for the channel Ao. Therefore, for 
these section we can apply (the Transition Length) Lemma loTI 
and conclude that there are at most c(di,d r )w/6 such section 
which have Battacharyya value between 5 and x u (l). This is 
the term (c). 

The claim now follows since the BEC component Ui is 
upper bounded by the corresponding Battacharyya parameter, 

Now consider a further constellation on [-N, N 3 ], We 
set v { q 0) = for all i € [-N, 0]. For i G [1,N 3 ] we set «(°) to 
the FP of forward DE according to Lemma 22 in ll53l . where 
the length of the constellation is taken to be N 3 — 1, e = 1, 
and \ = ^(1 — di/dr). More precisely, Lemma 22 in |l53| says 
that if we run forward DE, with free boundary condition, when 
transmitting over the BEC with e = 1 and (di,d r ,N 3 — 1 , w) 
coupled ensemble, then for large enough length, the one-sided 
FP of forward DE must be proper (non-trivial and increasing) 
and we can lower bound the Battacharyya parameter of the 
resulting FP. By our choice of N 3 this FP (on [L./V3]) has 
Battacharyya parameter at least 4(1 — di/d r ). Now since w > 
2dfd 2 r we have N 3 = \(N + 1)^H + 1 > N + 1. This 

implies that N+ ^ N ^ > 5. Thus <8{v^) > ±(1 - d x jd r ). 
Clearly, < yp> (component-wise). 

Apply forward DE, when transmitting through BEC with 
e = 1, to both constellation with a fixed boundary condition. 
More precisely, we have for all i G [-N, N 3 ] uf = 

s(«i-™+n • • • • u f+w-i) and v i i} = • • • > v f+w-i)- 

We keep uf = Ao, and v\ = Ao fixed, for all i > N 3 and 
I 6 N and for i < —N both the constellations have sections 
fixed to A +oc . Recall that is equal to u on [—TV, 0] and 
equal to 1 for the sections [1, A^]. Because of d52b . we have 
> uy>. From the monotonicity of the DE operator we 
conclude that the sequence yS ' is decreasing and since it is 
bounded from below it must converge. Call this limit u}°°- 
We claim that the sequence y}^ is increasing in t and since 
it is bounded from above it must converge. Call this limit 
v(°°>. Let us prove the claim that v™' is increasing. Indeed, 
for i G [-N,0], vf ] > v\ 0) = 0, for i G [l,N 3 -w+ 1], 
Vi = (since is an FP in that region) and for 

i G [A^3 — w + 2, 7Y3], v\ > v\ (since v\ is an FP with 
free boundary condition and hence replacing the boundary 
with 1 can only increase the value under DE). Again, from 
the monotonicity of DE we have that yy-> in i. Since v^> 
is increasing and proper we conclude that v*- 00 ) exists and 
is proper. Further, t/ 00 ) < yf°°\ since tA ^ < and the 
ordering is preserved under iterations of DE. 

Since *B(w (o °)) > ^{v^) > i(l - d ; /d r ) we claim that 

there must exists at least N$ = N 3 (l— 2 {i~s) ~ WC n' 3 '8^ ) sec " 



tions, from the right, with Battacharyya parameter greater than 
x u (l). Indeed, this can be obtained by considering the sections 
[1, JV3] of and then using v}°°> > vj-°\ More precisely, 
since the sections [1, N 3 ] of form a proper FP, if we let 
denote the number of sections with Battacharyya parameter 
less than S, then we get w- 0) ) < N^S + N 3 - N^. 

d 

Since £ JBCgi vf) > 1(1- we get N> 3 < 
and combining with the transition length Lemma [6T1 we get 
the expression for N5. Further, from the previous discussion, 
there are at least N4 values below 5 on the left. Thus, 
it is not hard to see that we can simultaneously choose 
6 > 0, w, L G N, K G N, N G N such that 

2(u;-l) < L, 
L < N 4 , 

L + w < K < N 5 < N 3 . 

We summarize, is a proper one-sided FP of DE for 

e = 1 with fixed boundary condition and < B(v < f^j +L ) < S 
and 93(1^1^) > x a (l). But we know from Theorem l47l 
that such a FP, vj°°\ must have a channel value close to 
e A (di,d r ), the area threshold of (di, d r )-regular ensemble 
when transmitting over BEC. More precisely, applying The- 
orem [47] we conclude that the entropy of the channel of 
■y(°°) must be less than e A (di, d r ) + c(di,d r , S, w, K, L). Since 
e A (di,d r ) < 4 s - < 10 we conclude that by choosing S small 
enough and K,L,N large enough, c(di,d r ,S,w,K,L) can 
be made arbitrarily small and hence the channel of v}°°> is 
strictly less than 1, leading to a contradiction since we started 
with e = 1. This contradiction tells us that we cannot have 
Q3([/(|y|*)) < x u (l)/2 when we apply the Schauder theorem. 
Hence the FP must be a true FP of DE. ■ 
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